<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" > <channel> <title>data storage Archives - CDInsights</title> <atom:link href="https://www.clouddatainsights.com/tag/data-storage/feed/" rel="self" type="application/rss+xml" /> <link>https://www.clouddatainsights.com/tag/data-storage/</link> <description>Trsanform Your Business in a Cloud Data World</description> <lastBuildDate>Fri, 25 Oct 2024 17:19:38 +0000</lastBuildDate> <language>en-US</language> <sy:updatePeriod> hourly </sy:updatePeriod> <sy:updateFrequency> 1 </sy:updateFrequency> <generator>https://wordpress.org/?v=6.6.1</generator> <image> <url>https://www.clouddatainsights.com/wp-content/uploads/2022/05/CDI-Favicon-2-45x45.jpg</url> <title>data storage Archives - CDInsights</title> <link>https://www.clouddatainsights.com/tag/data-storage/</link> <width>32</width> <height>32</height> </image> <site xmlns="com-wordpress:feed-additions:1">207802051</site> <item> <title>Four Surprising Facts About Data Storage</title> <link>https://www.clouddatainsights.com/four-surprising-facts-about-data-storage/</link> <comments>https://www.clouddatainsights.com/four-surprising-facts-about-data-storage/#respond</comments> <dc:creator><![CDATA[Chris Opat]]></dc:creator> <pubDate>Fri, 25 Oct 2024 17:19:31 +0000</pubDate> <category><![CDATA[Cloud Data Platforms]]></category> <category><![CDATA[data storage]]></category> <guid isPermaLink="false">https://www.clouddatainsights.com/?p=5538</guid> <description><![CDATA[Discover some surprising facts about data storage and what might be possible for future data storage solutions.]]></description> <content:encoded><![CDATA[ <figure class="wp-block-image size-full"><img fetchpriority="high" decoding="async" width="1000" height="660" src="https://www.clouddatainsights.com/wp-content/uploads/2024/10/Depositphotos_231065252_S.jpg" alt="" class="wp-image-5539" srcset="https://www.clouddatainsights.com/wp-content/uploads/2024/10/Depositphotos_231065252_S.jpg 1000w, https://www.clouddatainsights.com/wp-content/uploads/2024/10/Depositphotos_231065252_S-300x198.jpg 300w, https://www.clouddatainsights.com/wp-content/uploads/2024/10/Depositphotos_231065252_S-768x507.jpg 768w" sizes="(max-width: 1000px) 100vw, 1000px" /></figure> <p>For those outside the storage industry, it’s easy to dismiss the subject as dry and uninteresting. But as everyday functions continue to be digitized, everyday objects get “smarter” (that is, stuffed full of sensors), and the AI ecosystem continues to grow like wildfire throughout the business world, it’s becoming increasingly clear that ensuring quality and consistent data storage will be the muscle that keeps the modern world running. As data storage has evolved and new technologies have emerged, many fascinating facts about data storage have presented themselves. </p> <p>See also: Scaling Up: <a href="https://www.clouddatainsights.com/scaling-up-how-multi-tech-data-platforms-enhance-data-management/">How Multi-Tech Data Platforms Enhance Data Management</a></p> <h3 class="wp-block-heading">Hard Drives Technically Weigh More When Full</h3> <p>Anyone can tell that a full file cabinet weighs more than a drive storing all the same files in digital versions. But that does raise the question if the digital storage of those files weighs anything at all. As Einstein famously theorized, e = mc<sup>2</sup>. That formula shows that energy is defined by mass. Thus, we can infer that energy has a weight, even if it’s negligible. </p> <p>Now, hard drives record data by magnetizing a thin film of ferromagnetic material and forcing the atoms in a magnetic field to align in a different direction. Since magnetic fields have differing amounts of energy depending on whether they’re aligned or anti-aligned, technically the weight does change. According to<a href="https://www.ellipsix.net/blog/2009/04/how-much-does-data-weigh.html"> the calculations of David Zaslavsky</a>, it’d be approximately 10<sup>-14</sup> g for a 1TB hard drive. Luckily, such an amount is essentially unmeasurable. There’s no need to worry about adjusting for weight in a data center when the drives are full.</p> <h3 class="wp-block-heading">The Cloud Can Get Really Loud</h3> <p>One thing that people don’t often realize is that the physical data centers can run at high volumes. This is thanks to a combination of factors, largely cooling systems. Backblaze previously measured its its own data centers at approximately 78db. Other<a href="https://www.sensear.com/blog/data-centers-arent-loud-right%23:~:text=Based%2520on%2520research%2520found%2520by,to%252096%2520dB(A)."> data centers can reach up to 96dB</a> roughly the equivalent volume of a nearby motorcycle, newspaper press, or power mower, with likely hearing damage after<a href="https://www.iacacoustics.com/blog-full/comparative-examples-of-noise-levels"> 8 hours of exposure</a>.</p> <p>It’s worth investing in ways to reduce the noise—if not for worker safety, then to reduce the environmental impact of data centers, including noise pollution. There are a wealth of studies out there connecting noise pollution to cardiovascular disease, hypertension, high stress levels, sleep disturbance, and good ol’ hearing loss in humans. In our animal friends, noise pollution can disrupt predator/prey detection and avoidance, echolocation, and interfere with reproduction and navigation. Luckily, there are technologies to keep data centers (relatively) quiet such as acoustic enclosure around loud items such as diesel generators.</p> <h3 class="wp-block-heading">Data Storage Doesn’t Last Forever, But That Could Soon Change</h3> <p>While files themselves don’t physically expire, the storage mediums saving them degrade over time.</p> <p>Backblaze has<a href="https://www.backblaze.com/blog/backblaze-drive-stats-for-2023/"> extensive research available on how long drives last before failing</a>, and the findings show it can take several years before that expiration happens. While every model of drive is unique, there’s some basic time frames involved: 4–7 years for hard disk drives (HDDs), 5–10 years for solid-state drives (SSDs), and flash drives have 10 years of average use.</p> <p>However, with new technologies—and their consumer applications—emerging, we might see these timeframes get left in the dust. The Institute of Physics reports that data written to a glass memory crystal could remain intact for a million years, a product they’ve dubbed the<a href="https://physicsworld.com/a/5d-superman-memory-crystal-heralds-unlimited-lifetime-data-storage/"> “Superman crystal.”</a> So, look out for lasers altering the optical properties of quartz at the nanoscale—we certainly will be checking them out if a customer ever asks to store their files for a million years.</p> <h3 class="wp-block-heading">Data Centers Benefit From Expensive Real Estate</h3> <p>Optimizing your connectivity (getting data from point A to point B) to the strongest networks is no simple feat. And, it’s important to remember that there’s a hardware element to those networks. So, where there are more people, there’s more networking infrastructure. From an operational standpoint, you’d likely assume it’s a bad choice to have your data center in the middle of the most expensive real estate and power infrastructures in the world. However, there are tangible benefits to joining up all those networks at a central hub and to putting them in or near population centers. </p> <p>We call those spaces carrier hotels—facilities where metro fiber carriers meet long-haul carriers for dozens of network providers. As a result, those carrier hotels sit on some of the most expensive real estate in the world. Citing<a href="https://dgtlinfra.com/carrier-hotels-data-center/%23:~:text=Street,%2520Phoenix,%2520Arizona-,What%2520is%2520a%2520Carrier%2520Hotel?,platforms%2520within%2520the%2520carrier%2520hotel."> DGTL Infra</a>, the biggest carrier hotels are located in the downtowns of Los Angeles, Chicago, Dallas, Miami, New York City, and Seattle. Did you know that 80% of the traffic on the internet passes through the Dallas Infomart? It’s no wonder they have over 70 carriers to connect with in that property!</p> <h3 class="wp-block-heading">Data Storage Is Only Going to Get More Interesting</h3> <p>Today it’s estimated that there are over 8,000 data centers (DCs) in the world, built on a variety of storage media, connected to various networks, consuming vast amounts of power, and taking up valuable real estate. As the need for storage grows and new technologies reach the market, I’m excited to see how the industry evolves and what quirks and counterintuitive concepts emerge next.</p> <div class="saboxplugin-wrap" itemtype="http://schema.org/Person" itemscope itemprop="author"><div class="saboxplugin-tab"><div class="saboxplugin-gravatar"><img decoding="async" src="https://www.clouddatainsights.com/wp-content/uploads/2024/10/Chris-Opat.jpg" width="100" height="100" alt="" itemprop="image"></div><div class="saboxplugin-authorname"><a href="https://www.clouddatainsights.com/author/chris-opat/" class="vcard author" rel="author"><span class="fn">Chris Opat</span></a></div><div class="saboxplugin-desc"><div itemprop="description"><p>Chris Opat joined <strong><a href="https://www.backblaze.com/">Backblaze</a></strong> as our senior vice president of cloud operations in 2023. Before joining Backblaze, he served as senior vice president of platform engineering and operations at StackPath, a specialized provider in edge technology and content delivery. He brings a passion for building teams of experienced technologists who push the envelope to create a best-in-class experience for Backblaze customers. Chris has over 25 years of experience in building teams and technology at startup and scale-up companies. He also held leadership roles at CyrusOne, CompuCom, Cloudreach, and Bear Stearns/JPMorgan. Chris earned his B.S. in Television & Digital Media Production at Ithaca College.</p> </div></div><div class="clearfix"></div></div></div>]]></content:encoded> <wfw:commentRss>https://www.clouddatainsights.com/four-surprising-facts-about-data-storage/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">5538</post-id> </item> <item> <title>The Long Due Renewal of Data Storage for Accelerated Data Applications</title> <link>https://www.clouddatainsights.com/the-long-due-renewal-of-data-storage-for-accelerated-data-applications/</link> <comments>https://www.clouddatainsights.com/the-long-due-renewal-of-data-storage-for-accelerated-data-applications/#respond</comments> <dc:creator><![CDATA[Animesh Kumar and Travis Thompson]]></dc:creator> <pubDate>Sat, 08 Jul 2023 02:13:02 +0000</pubDate> <category><![CDATA[Data Architecture]]></category> <category><![CDATA[data architecture]]></category> <category><![CDATA[data storage]]></category> <guid isPermaLink="false">https://www.clouddatainsights.com/?p=3474</guid> <description><![CDATA[A unified storage solution holds the key to a transformed data management experience. Discover why data storage is a mess and how to fix it.]]></description> <content:encoded><![CDATA[<div class="wp-block-image"> <figure class="aligncenter size-full is-resized"><img decoding="async" src="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Depositphotos_8338420_S.jpg" alt="" class="wp-image-3493" width="750" height="500" srcset="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Depositphotos_8338420_S.jpg 1000w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Depositphotos_8338420_S-300x200.jpg 300w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Depositphotos_8338420_S-768x512.jpg 768w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Depositphotos_8338420_S-930x620.jpg 930w" sizes="(max-width: 750px) 100vw, 750px" /><figcaption class="wp-element-caption"><em>A unified storage solution holds the key to a transformed data management experience. Discover why data storage is a mess and how to fix it.</em></figcaption></figure></div> <p>We need to end the disaster of disparate storage in data engineering. We’ve been so used to the complexities of multiple data sources and stores as isolated units, we consider it part of the process. Ask a Data Engineer what their usual day looks like, and you’ll get an elaborate spiel on overwhelming pipelines with a significant chunk dedicated to integrating and maintaining multiple data sources.</p> <p>Having a single isolated lake is far from the desired solution. It eventually results in the same jumbled pipelines and swampy data- the two distinct nightmares of a data engineer. There is a dire need to rethink storage as a <strong>unified solution</strong>. Declaratively interoperable, encompassing disparate sources into a single unit with embedded governance, and independent to the furthest degree.</p> <h3 class="wp-block-heading">The Expensive Challenges of Data Storage</h3> <p>Let’s all agree based on historical evidence, disruptive transformations end up costing us more time and resources than the theoretical plan and given the data domain is mined with rapidly evolving innovations, yet another disruption with distant promises is not ideally favourable for practical reasons.</p> <p>So instead, let’s approach the problem with a product mindset to optimise storage evolution- what are the most expensive challenges of storage today, and how can they be pushed back non-disruptively?</p> <ul class="wp-block-list"> <li><strong>ELT Bottlenecks:</strong> Access to data is very restrictive in prevalent data stacks, and a central engineering team has to step in as a mediator, even for minor tasks. The transformation necessary to make your data useful creates multiple dependencies on central engineering teams, who become high-friction mediators between data producers and consumers. Engineers burdened with open tickets in not uncommon, indefinitely stuck in patching up countless faulty pipelines.</li> </ul> <ul class="nv-cv-m wp-block-list"> <li><strong>Isolation of storage as a standalone unit</strong> Storage so far has only been treated in silos. Data is optimised specifically for applications or engines, which severely limits its usefulness. Analytical engines that it doesn’t accommodate suffer from gruelling speed and expensive queries. Isolated sources also mean writing complex pipelines and workflows to integrate several endpoints. Maintaining the mass sucks out most of the engineer’s time, leaving next to no time for innovating profitable projects and solutions.</li> </ul> <ul class="nv-cv-m wp-block-list"> <li><strong>Cognitive Overload and Poor Developer Experience</strong> The consequence of the above two challenges is the extensive cognitive load which translates to sleepless nights for data engineers. Data engineers are not just restricted to plumbing jobs that deal with integrating countless point tools around data storage, but they are also required to repeatedly maintain and relearn the dynamic philosophies and patterns of multiple integrations.</li> </ul> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li><strong>Dark Data</strong> The high-friction processes around data storage make accessing data at scale extremely challenging. Therefore, even if the organisation owns extremely rich data from valuable sources, they are unable to use it to power its business. This is called dark data- data with plausible potential but with no utilisation.</li> </ul> <h3 class="wp-block-heading">The Necessary Paradigm: Unification of Data Storage</h3> <h4 class="wp-block-heading">Storage as a Unified Resource</h4> <p>If storage continues to be dealt with as scattered points in the data stack, given the rising complexity of pipelines and the growth spurt of data, the situation would escalate into broken, heavy, expensive, and inaccessible storage.</p> <p>The most logical next step to resolve this is a <strong>Unified Storage</strong> paradigm. This means a single storage port that easily interoperates with other data management tools and capability layers.</p> <p><img loading="lazy" decoding="async" src="https://lh3.googleusercontent.com/9bcaD4CGRJ-MF__pZ94ks_hLrUFsY-I6-yLdQD69uU0mxr5ArNrvGZN25L9qikXu3crGNp11B5IzfhjPgkCU24uZ6ydwMDi41MMsX4g4B7Ee4ozzFc2pQLH-Wdq3flybb0cWNdetizpE0_0kTkiUCs8" width="624" height="827"></p> <h5 class="wp-block-heading"><strong>Unified Access</strong></h5> <p>Instead of disparate data sources to maintain and process, imagine the simplicity if there was just one point of management. No complication while querying, transforming, or analyzing because there’s only one standardized connection that needs to be established instead of multiple integrations, each with a different access pattern, philosophy, or optimization requirement.</p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li><strong>Homogenous Access Plane</strong>: In prevalent storage systems, we deal with heterogeneous access ports with overwhelming pipelines, integration and access patterns, and optimization requirements. Data is almost always moved (expensive) to common native storage to enable fundamental capabilities such as discoverability, governance, quality assessment, and analytics. And the movement is not a one-time task, given the rate of change of data across the original sources.<br><br>In contrast, Unified Storage brings in a homogenous access plane. Both native and external sources appear as part of one common storage – a single access point to querying, analyzing, or transforming jobs. So even while the data sits at disparate sources, the user accesses it as if from one source with no data movement or duplication.<br></li> <li><strong>Storage-Agnostic Analytics</strong>: In prevalent systems, analytical engines are storage-dependent and data needs to be optimized specifically for the engines or applications. Without optimization, queries slow down to a grueling pace, directly impacting analytical jobs and business decisions. Often, data is optimized for a single application or engine, which makes it challenging for other applications to consume that data.<br><br>In contrast, with the unification of storage as a resource, data is optimized for consumption across engines (SQL, Spark, etc.), enabling analytical engines to become storage-agnostic. For example, DataOS, which uses Unified Storage as an architectural building block, has two query engines – Minerva (static query engine) and Themis (elastic query engine), designed for two distinct purposes, and multiple data processing engines, including Spark. However, all the engines are able to interact with the Unified Storage optimally without any need for specific optimization or special drivers and engines.<br></li> <li><strong>Universal Discoverability</strong>: In prevalent data stacks, data is only partially discoverable within the bounds of the data source. Users have an obligation to navigate to specific sources or move data to a central location for universal discovery.<br><br>On the contrary, Unified Storage is able to present virtualized views of data across heterogeneous sources. Due to metadata extraction from each integrated source, native or external, users are able to design very specific queries and get results from across their data ecosystem. They can look for data products, tables, topics, views, and much more without moving or duplicating data.</li> </ul> <h5 class="wp-block-heading"><strong>Unification through Modularization</strong></h5> <p>Most organizations, especially those at considerable scale, have the need to create multiple storage units, even natively, for separation of concerns, such as for different projects, locations, etc. This means building and maintaining storage from scratch for every new requirement. Over time, these separate units evolve into divergent patterns and philosophies.</p> <p>Instead, with Unified Storage, users create modular instances of storage with separately provisioned infrastructure, policies, and quality checks- all under the hood of one common storage with a single point of management. Even large organizations would have one storage resource to master and maintain, severely cutting down the cognitive load. Each instance of storage does not require bottom-up build or maintenance, just a single spec input to spin it up over a pre-existing foundation.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="1022" src="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-1024x1022.png" alt="Visual representation of provisioning instances of unified stream and lakehouse within one common storage. Fastbase and Icebase are implementations of the unified stream and unified lakehouse, respectively." class="wp-image-3475" srcset="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-1024x1022.png 1024w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-300x300.png 300w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-150x150.png 150w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-768x767.png 768w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-1536x1533.png 1536w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2-45x45.png 45w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-2.png 1600w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Visual representation of provisioning instances of unified stream and lakehouse within one common storage. Fastbase and Icebase are implementations of the unified stream and unified lakehouse, respectively.</figcaption></figure> <h5 class="wp-block-heading"><strong>Unified Streaming</strong></h5> <p>In existing storage constructs, streaming is often considered a separate channel and comes with its own infrastructure and tooling. There is no leeway to consume stream data through a single unified channel. The user is required to integrate disparate streams and optimize them for consumption separately. Each production and consumption point must adhere to the specific tooling requirements.</p> <p>Unified streaming allows direct connection and automated collection from any stream, simplifies transformation logic through standardized formats, and easily integrates historical or batch data with streams for analysis. An implementation of unified storage, such as Fastbase, supports Apache Pulsar format for streaming workloads. Pulsar offers a “unified messaging model” that combines the best features of traditional messaging systems like RabbitMQ and pub-sub (publish-subscribe) event streaming platforms like Apache Kafka.</p> <h5 class="wp-block-heading"><strong>Unified Data Governance and Observability</strong></h5> <p>We often see in existing data stacks how complex it is to govern data and data citizens on top of data management. This is because the current data stack is a giant web of countless tools and processes, and there is no way to infuse propagation of declarative governance and quality through such siloed systems. Let’s take a look at how a Unified Architecture approaches this.</p> <figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1012" height="1024" src="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-1012x1024.png" alt="Gateway communicates with the governance engine to get user tag information, the observability engine to get quality information, the infrastructure orchestrator to get cluster information, and Catalog for metadata." class="wp-image-3477" srcset="https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-1012x1024.png 1012w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-297x300.png 297w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-768x777.png 768w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-1519x1536.png 1519w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3-45x45.png 45w, https://www.clouddatainsights.com/wp-content/uploads/2023/06/Modern-image-3.png 1582w" sizes="(max-width: 1012px) 100vw, 1012px" /><figcaption class="wp-element-caption">Gateway communicates with the governance engine to get user tag information, the observability engine to get quality information, the infrastructure orchestrator to get cluster information, and Catalog for metadata.</figcaption></figure> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li><strong>Governance</strong>: In existing storage paradigms, Governance is undoubtedly an afterthought. It is rolled out per source and is restricted to native points. Additionally, there is an overload of credentials to manage for disparate data sources. Siloed storage is also not programmed to work with standard semantic tags for governance. Both masking and access policies for users or machines must be applied in silos.<br><br>In contrast, Unified Storage comes with inherent governance, which is automatically embedded for data as soon as it’s composed into the unified storage resource, irrespective of native or external locations. Abstracted credential management allows users to “access once, access forever” until access is revoked. Universal semantic tags enforce policies across all data sources, native or external. Implementation of tags at one point (say, the catalog), propagates the tag across the entire data stack.<br></li> </ul> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li><strong>Observability & Quality</strong>: Much like Governance, Observability has also remained an afterthought, rolled out at the tail end of data management processes, most often as an optional capability. Moreover, observability too exists in silos, separately enabled for every source. Prevalent storage constructs have no embedded quality or hygiene checks which is a major challenge when data is managed at scale. If something faults, it eats up resources over an unnecessarily complex route to root cause analysis (RCA).</li> </ul> <p>Unified storage has the ability to embed storage as an inherent feature due to high interoperability and metadata exposure from the storage resources and downstream access points such as analytical engines. Storage is no longer a siloed organ, but right in the middle of the core spine, aware of transmissions across the data stack. This awareness is key to speedy RCA, and, therefore, better experience for both data producers and consumers.</p> <h3 class="wp-block-heading">Enabling the Unified Storage Paradigm</h3> <p>Within the scope of this section, we’ll cover how we’ve established the unification of storage-as-a-resource through some of our native development initiatives.</p> <h4 class="wp-block-heading"><strong>Capabilities on top of Standard Lakehouse Format</strong></h4> <p>Icebase is a simple and robust cloud-agnostic lakehouse, built to empower modern data development. It is manifested using the Iceberg table format atop Parquet files inside an object store like Azure Data Lake, Google Cloud Storage, or Amazon S3. It provides the necessary tooling and integrations to manage data and metadata simply and efficiently, as well as inherent interoperability with capabilities spread across layers inside the unified architecture of the data operating system.</p> <p>While Icebase manages OLAP data, Fastbase supports Apache Pulsar format for streaming workloads. Pulsar offers a “unified messaging model” that combines the best features of traditional messaging systems like RabbitMQ and pub-sub (publish-subscribe) event streaming platforms like Apache Kafka.</p> <p>Users can provision multiple instances of Icebase and Fastbase for their OLAP, streaming, and real-time data workloads. Let’s look at how these storage modules solve the pressing problems of data development that we discussed before.<br></p> <h4 class="wp-block-heading"><strong>Unified Access and Analytics</strong></h4> <p>A developer needs to avoid the hassle of pushing credentials into source systems or raising tickets for access every time before starting to engineer a new asset. At the same time, having robust and enforceable policies in place is necessary to prevent breaches in the security of both raw data and data products. Two resources within a data operating system – Depot and Policy – come together with Icebase to manage a healthy tradeoff between access and security.</p> <p>To add to this, Minerva – the querying engine of the DataOS – brings together unified access and on-demand computing to enable both technical & non-technical users to achieve outcomes without concern about the heterogeneity of their current data landscape, thus significantly reducing iterations with IT.</p> <h4 class="wp-block-heading"><strong>Access and Addressability</strong></h4> <p>The unified storage architecture allows you to connect and access data from managed and unmanaged object storage by abstracting out various protocols and complexities of the source systems into a common taxonomy and route. This abstraction is achieved with the help of “Depots”, which can be comprehended as a registration for data locations to make them systematically available to the wider data stack.</p> <p>A depot assigns a unique address to every source system in a uniform format. This is known as the Universal Data Link (UDL), and it allows direct access to datasets without having to specify credentials again. dataos://[depot]:[collection]/[dataset]</p> <p>In Pulsar, topics are the named channels for transmitting messages from producers to consumers. Taking this into account, any Fastbase UDL (Universal Data Link) within the DataOS is referenced as dataos://fastbase:<schema_name>/<dataset_name>, where <em>dataset</em> is a pulsar topic. Similarly, an example Icebase UDL will look like dataos://icebase:retail/city</p> <p>Depots have default access policies built in to ensure secure access for everyone in the organization. It is also possible to create custom access policies to access the depot and data policies for specific datasets within it. This is only for accessing the data, not moving or copying it.</p> <p>Being completely built on open standards, extending interoperability to non-native components outside the operating system is possible through standard APIs.</p> <p>Connection to a depot opens up access to all the capability layers inside a unified data architecture, including governance, observability, and discoverability. The depot is managed by a Depot Service which provides:</p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li>A single-point connection with analytics, observability, and discoverability layers<br></li> <li>Access to several native stacks, engines, and data apps. These can be considered as consumers of a depot.<br></li> <li>Pluggability with secret management systems, whether DataOS native or external. this helps you store credentials in the DataOS address itself. One can ask the stacks to use secrets with specific permissions e.g.<br><br>dataos://[depot]:[collection]/[dataset]?acl=r | for read access<br>or<br>dataos://[depot]:[collection]/[dataset]?acl=rw | for write access<br></li> <li>Data Definition Language (DDL) and Data Manipulation Language (DML) interfaces that power creating and managing (add/remove columns, partitions) capabilities on the data and metadata that are connected through the depot.<br></li> </ul> <h4 class="wp-block-heading"><strong>Analytics</strong></h4> <p>Minerva is an interactive query engine based on Trino. It allows users to query data through a single platform across heterogeneous data sources without needing to know each database’s specific configuration, query language, or data format.</p> <p>The unified data architecture enables users to create and tune multiple Minerva clusters as per the query workload, which helps in using your compute resources efficiently and scaling up or down depending on the demand flexibly. Minerva provides the ability to create and tune the necessary amount of compute.</p> <p>Say we want to find the customer(s) who made the highest order quantity purchases for any given product and the discount given to them, for which the data has to be fetched from two different data sources – an Azure MS-SQL and a PostgreSQL database. First, we’ll create depots to access the two data sources and then a Minerva cluster. Once this is in place, we can seamlessly run queries across the databases, and tune our Minerva cluster according to the workload if necessary.</p> <p><strong>Creating Depots</strong><strong><br></strong>Write the following YAML files to connect two different data sources to the Unified Storage resource.</p> <p><strong>Microsoft Azure SQL Database Depot</strong></p> <p><img loading="lazy" decoding="async" width="624" height="440" src="https://lh3.googleusercontent.com/atEim5h3nSjM-aBcA_wTvgYoMgPfmwa3UYKkncH0iTC-BPVNpHFmVJu_dvItObNuzXsn5JhWEc_CVeMdOBiq7zpT2PKCIzk2n0jUpOxo-d59YKU9-QR8IY5I6P3xKR_kQH9laOSn097BM7yE6MXdCwk"></p> <p><strong>PostgreSQL Database</strong></p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/_pDirxq1ZKX42fEGJf05wQ0LsDw1fa1e4sBF_zYnov-f6BN3j2L-VNt6jMk8G1Vgs7YVnG0PqQYkB8DVN9HOBMZoDBzlMDZVC5OjYu3PTDoUwwu4zXNiM5cb2hZLNJpnjzXocdd3kuOBC_83pEFZebA" alt=""/></figure> <p>Use the apply command in the CLI to create the Depot resource</p> <figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/6_zMRlYm3q18GvmPZmEsacKLTR0PoUC78-Yz-yicoZrfzI4R97vYaBMhgY1WTshLrZdPhbkt4t4rvTjn6Y0B0fMjkeqbbaeYKor3Zic8P3EuMCKzUQ-TBwXP3tYFpjJHpga9315L9OFI-5URjAMid3o" alt=""/></figure> <figure class="wp-block-image"><img decoding="async" src="https://lh6.googleusercontent.com/divnOATGXlcuO77H1ptgM7Aa4IjWO_uUjoZvPWBpGBcfFs1gnHs8wehSgAsWYiuy63wfRsNmWqSvJuUl6q4bTE3akG9iuCAvXzmQcrnE5P50j_n6T11pIkYcq8SYysg1EB9rDLYF4WHzIyNxHPF6uBE" alt=""/></figure> <p><strong>Creating the Query Cluster</strong><strong><br></strong>To enhance the performance of your analytics workloads, choose appropriately sized clusters, or create a new dedicated cluster for your requirements. In this example, a new cluster is created.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/ElcSCWC2Q3gPf4oqAerOpZeGnX3qERjLsbWqSYl1Gwou-pD_PeOc4THPE2p6Y9D-uq6JmHu5oeVjhgjT2NDeMdNIga0IilWSwnPUrfwTKGKj6WLW1Ngp7Os2dbZVHKoTZ7KeUShKT3i5hkZ_RUZiph4" alt=""/></figure> <p>Use the apply command to create the Cluster</p> <figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/ot9Fr9lopCR-rt92Ou7lFjUXmqGvpBYiR7Hhbiqz5a923v6X2Bh3uChqqP_NFhNCaEQWLv0_lNaMJtu23860spYWLKb91umjWtKE-ZLOI8JotX299hXJRzjgBPHreMkB92B5Cp4F79df86eSxj9EMq4" alt=""/></figure> <p>Now, we can run the below query on two different data sources as if we are querying from a homogenous plane that hosts all tables from across these two sources. The results are shown here in Workbench, a native web-based analytics tool in DataOS.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/Ep1nyZdgZ9xuww5pYj4i6M3JgxOX82xQ_0ciyBeCa70Prhqghlf2DWSe4WzNcPd6tdQk8M-2f5_aWEhBcsScJb52WgAuZVbjhi-XKk_E1QWEUCCqn8vd4E2a57Qbe2VYpuywTcCjoOsIBvIUSBrWcCE" alt=""/></figure> <figure class="wp-block-image"><img decoding="async" src="https://lh6.googleusercontent.com/EtfOOOMs8eVmz2f4PjgWt5hYDcCvnIaIRV-z19KdLHqTcvrxKx4dcem8lcKoni8UV65nADMNo6HT8Cn3V78M5FAwbOkaPNwnTOMi9nN3iQEhqPvQXaNSBGbjQt0AFMEH577EyJhWgLylaA3ylmjWd2A" alt=""/></figure> <p>Minerva (query engine) can perform query pushdowns for entire or parts of queries. In other words, it can push processing closer to the data sources for improved query performance.</p> <h4 class="wp-block-heading"><strong>Unified Governance and Management</strong></h4> <p>‘Policy’ is an independent resource in a unified data architecture that defines rules of relationship between subjects (entities that wish to perform actions on a target), predicates (the action), and objects (a target). Its ABAC (attribute-based access control) implementation uses tags and conditions to define such rules, enabling coarse-grained role-based access to datasets and fine-grained access to individual items within the dataset.</p> <p>Tags, which can be defined for datasets and users, are used as attributes. As soon as a new source or asset is created, policies pertaining to its tags are automatically applied to them. Changing the scope of the policy is as simple as adding a new attribute or removing an existing one from the policy. Special attributes can be written into the policy and assigned to a specific user if the need for exceptions arises. The attribute-based definition makes it possible to create fewer policies to manage the span of data within an organisation, while also needing minimal changes during policy updates.</p> <p>For the purpose of enforcing policies across all data sources while accessing them, a Gateway Service sits like an abstraction above Minerva (query engine) clusters. Whenever a query is fired from any source (query tool, app, etc.), Gateway parses and formats the query before forwarding it to the Minerva cluster. After receiving the query, Minerva analyzes it and sends a decision request to Gateway for the governance to be applied. Gateway reverts with a decision based on user tags (received from the governance engine) and data-policy definition (which it stores in its own database – Gateway DB).</p> <p>Based on the decision, Minerva applies appropriate governance policy changes like filtering and/or masking (depending on the dataset) and sends the final result set to Gateway. The final output is then passed to the source where the query was initially requested.</p> <h4 class="wp-block-heading"><strong>Ease of Operability and Data Management</strong></h4> <p>A unified storage architecture that provides advanced capabilities needs equally advanced maintenance. Thus, built-in capabilities to declaratively manage and maintain data and metadata files are a necessity.</p> <p>In Icebase or the lakehouse construct, these operations can be performed using a “Workflow” – a declarative stack for defining and executing DAGs. The maintenance service in Workflow is offered through “<strong>actions</strong>“, which is defined in its own section within a YAML file. Below are examples of a few <strong>actions</strong> for seamless data management.</p> <p><strong>Rewrite Dataset<br></strong>Having too many data files leads to a large amount of metadata in manifest files, while small data files result in less efficient queries and higher file open costs. This issue can be resolved through <strong>compaction</strong>. Utilizing the rewrite_dataset <strong>action</strong>, the data files can be compacted in parallel within Icebase. This process combines small files into larger ones, reducing metadata overhead and file open costs during runtime. Below is the definition for the rewrite_dataset action.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/aHtntsBYCBmdK5t5rEoD1CdvL55vUgA42ya493w8bV_KbE7wqEeKUtk95UDOVgK8mad-4aBmqQ4VzbahlgHpNdqq-KwMKeHEkgPOfy50phyjJAfHTaHRcNSYHkjCL4GxaLYCwgv0meJHAJaMkKU5qj0" alt=""/></figure> <p><strong>Expire Snapshots</strong><strong><br></strong>Writing to an Iceberg table in an Icebase depot creates a new snapshot of the table, which can be used for time-travel queries. The table can also be rolled back to any valid snapshot. Snapshots accumulate until the expire_snapshots action expires them. It is recommended to regularly expire snapshots to delete unnecessary data files and keep the size of the table metadata small.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/7sYoHrXUvvu8r9f6aHIoBwH1sv1-_YOBx-YhcVZGLuptqQRSfp_osG4GdnOvkQeAFRan7XfNgbJk3R4hSuR-RX76Q4Qzee7NCx8zoP15pbav2OkH_LJWQxRdvirJOI1SHYhrQgBIdt-Md9CauIpUAsg" alt=""/></figure> <p><strong>Garbage Management</strong><strong><br></strong>While executing Workflows upon Icebase depots, job failures can leave files that are not referenced by table metadata, and in some cases, normal snapshot expiration may not be able to determine if a file is no longer needed and delete it. To clean up these ‘orphan’ files under a table location older than a specified timestamp, we can use the remove_orphans action.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/P5lVDCtTs--k5PSU9-ibwSRu5STdrfrg2-e4Oa-_P1ogiGP7Zg0ixJSiCwS0Wk8ChionJH91ZiVY46ZzzRrywT4paTTsh92Wa0U7sXz9lnh02Q52eeVz68h1DQ6AMGSNpCa7jAad5LR6hQv_5DQeaEc" alt=""/></figure> <p><strong>Delete from Dataset</strong><strong><br></strong>The delete_from_dataset action removes data from tables. The action accepts a filter provided in the deleteWhere property to match rows to delete. If the delete filter matches entire partitions of the table, Iceberg format within the Icebase depot will perform a metadata-only delete. If the filter matches individual rows of a table, then only the affected data files will be rewritten. The syntax of the delete_from_dataset action is provided below:</p> <figure class="wp-block-image"><img decoding="async" src="https://lh6.googleusercontent.com/uUis_aYb_6lw6WdsxjJzO7usgXnVk8i0RTloZ6flQR26FcIAFvBEO_g7g4iJBn4X2jtXuYZVR7vOtY1QlhczISmbCvz7LZMieO6WsNicfkk5e-KfScAazjSgFqtKN0PMrsyBE8iQmdPV6F3Sd1lnzmQ" alt=""/></figure> <h4 class="wp-block-heading"><strong>Time Travel</strong></h4> <p>Iceberg generates Snapshots every time a table is created or modified. To time travel, one needs to be able to list and operate over these snapshots. The lakehouse should provide simple tooling for the same. The Data Toolbox provides the functionality for metadata updating through the set_version action, using which one can update the metadata version to the latest or any specific version. This is done in two steps –</p> <ol class="nv-cv-d nv-cv-m wp-block-list"> <li>First, a YAML workflow is created, containing a Toolbox Job. A sample Toolbox Workflow is given below:</li> <li>The above workflow is then applied with the CLI</li> </ol> <p><strong>Using the CLI</strong><strong><br></strong>The CLI on top of a unified data architecture can be used to work directly with Snapshots and metadata. For example, a list of snapshots can be obtained with the following command –</p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/yhxVLSXJKjukBgwwpoz4ay5injOptVztYs2Oq5jHDsuS9_FlN1yfksa5bSG1SicXu-SxqoBDp269CE85XWHZMTQU5_53oDXZp0EULuiYYJ21sj3N_3Aj91UxToFoFbTfQ2nwoRjmDbb5aVsPLw7ooks" alt=""/></figure> <p>To travel back, we can set a snapshot at the desired timestamp.</p> <figure class="wp-block-image"><img decoding="async" src="https://lh5.googleusercontent.com/8yuv-lgNGduRtXcT9lftoJOtnToDhRTODMPahcbxAhmkDTaLMQgWS864Ux54ygCj-MIRS5_IDpFqdPZJvfP-KEtelIXASmupEQTEo7Z3K8K3BBjxm9VHtLTSdD6tN5bczsKk_pwlGhdC7wkvuXZS8P0" alt=""/></figure> <p>One can also list all metadata files, and metadata can also be set to the latest or some specific version using single-line commands.</p> <h4 class="wp-block-heading"><strong>Partitioning and Partition Evolution</strong></h4> <p>Partitioning is a way to make queries faster by grouping similar rows together when writing. We can use both the Workflow stack through a YAML file and the CLI to work with Partitions in Icebase. The following types of Partitions are supported: timestamp (year, month, day and hour) and identity (string and integer values).</p> <p>You can partition by timestamp, identity, or even create nested partitions. Below is an example of partitioning by identity.</p> <p>If the partition field type is identity type, the property ‘name’ is not needed.</p> <p>Workflow:</p> <figure class="wp-block-image"><img decoding="async" src="https://lh3.googleusercontent.com/s2i18BAn_syHjor9J7iDmo32adAWZXGspRCPDqPonERRpLLTxu_Occqe2_XsiASChkCZJLTUiLsJ7Ia257qprKo6_xhKPSUinaIGH2KnEMKYrNTrMOa9gwLWsqrwRvKFIOBMSNZqlutQQQfbdVWYjGA" alt=""/></figure> <p>CLI:</p> <figure class="wp-block-image"><img decoding="async" src="https://lh4.googleusercontent.com/C-wproS7o0umahXibmSHRDKR5QTVa14hOro6v4MWbymgGBoh6mOcvrbKmpzmo2lH1ztTfwg5Gha-vhq5O1NbYfLkLkOCSpJDxQqLuXUXJJ07izZEs2ZKcaSU36v6CEQFmHri9uoJAMpwA2Xs1OgPljk" alt=""/></figure> <p>Thanks to Iceberg, when a partition spec is evolved, the old data written with an earlier partition key remains unchanged, and its metadata remains unaffected. New data is written using the new partition key in a new layout. Metadata for each of the partition versions is kept separately. When you query, each partition layout’s respective metadata is used to identify the files it needs to access; this is called split planning.</p> <p>Say, we are working on a table “NY Taxi Data”. The NY Taxi data has been ingested and is partitioned by year. When the new data is appended, the table is updated to partition the data by day. Both partitioning layouts can co-exist in the same table. The query need not contain modifications for the different layouts.</p> <h3 class="wp-block-heading">Summary</h3> <p>In essence, let’s summarize with the understanding that siloed storage is a deal breaker when it comes to data management practices. In the article, we saw how data stored, accessed, and processed in siloes creates a huge cognitive overload for data developers as well as eats up massive resources. On the contrary, a unified storage paradigm bridges the gap between different capabilities essential to the data stack and helps users produce and consume data as if data were one big connective tissue. Users can access data through standardized access points, analyze it optimally without fretting about specific engines and drivers, and most essentially, rely on that data that arrives on their plate with embedded governance and quality essentials.</p> <p></p> <div class="saboxplugin-wrap" itemtype="http://schema.org/Person" itemscope itemprop="author"><div class="saboxplugin-tab"><div class="saboxplugin-gravatar"><img alt='Animesh Kumar and Travis Thompson' src='https://secure.gravatar.com/avatar/7e882ca7d7f2257a9cef794392f87a0a?s=100&d=mm&r=g' srcset='https://secure.gravatar.com/avatar/7e882ca7d7f2257a9cef794392f87a0a?s=200&d=mm&r=g 2x' class='avatar avatar-100 photo' height='100' width='100' itemprop="image"/></div><div class="saboxplugin-authorname"><a href="https://www.clouddatainsights.com/author/animesh-kumar-and-travis-thompson/" class="vcard author" rel="author"><span class="fn">Animesh Kumar and Travis Thompson</span></a></div><div class="saboxplugin-desc"><div itemprop="description"><p><strong><a href="https://www.linkedin.com/in/anismiles/">Animesh Kumar</a></strong> is the CTO & Co-Founder <a href="https://themoderndatacompany.com/">The Modern Data Company</a>, and a founding contributor to the<a href="http://datadeveloperplatform.org/"> Data Developer Platform</a> Infrastructure Specification that enables flexible implementation of disparate data design architectures such as data products, meshes, or fabrics. He has architected engineering solutions for several A-Players, including the likes of NFL, GAP, Verizon, Rediff, Reliance, SGWS, Gensler, TOI, and more. Animesh shares his thoughts on innovating a holistic data experience on<a href="https://moderndata101.substack.com/"> ModernData101</a>. <strong><a href="https://www.linkedin.com/in/travis-w-thompson/">Travis Thompson</a> </strong>is the Chief Architect of <a href="http://data-operating-system.com/">DataOS</a> and a key contributor to the <a href="http://datadeveloperplatform.org/">Data Developer Platform</a> Infrastructure Specification that enables flexible implementation of disparate data design architectures such as data products, meshes, or fabrics. Over the course of 30 years in all things data and engineering, he has designed state-of-the-art architectures and solutions for top organizations, the likes of GAP, Iterative, MuleSoft, HP, and more.</p> </div></div><div class="clearfix"></div></div></div>]]></content:encoded> <wfw:commentRss>https://www.clouddatainsights.com/the-long-due-renewal-of-data-storage-for-accelerated-data-applications/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">3474</post-id> </item> <item> <title>Cloud Basics: NAS Versus the Cloud</title> <link>https://www.clouddatainsights.com/cloud-basics-nas-versus-the-cloud/</link> <comments>https://www.clouddatainsights.com/cloud-basics-nas-versus-the-cloud/#respond</comments> <dc:creator><![CDATA[Elizabeth Wallace]]></dc:creator> <pubDate>Wed, 14 Dec 2022 23:28:28 +0000</pubDate> <category><![CDATA[Cloud Strategy]]></category> <category><![CDATA[cloud NAS]]></category> <category><![CDATA[data storage]]></category> <category><![CDATA[NAS]]></category> <guid isPermaLink="false">https://www.clouddatainsights.com/?p=2014</guid> <description><![CDATA[Companies are looking at cloud based storage choices. Here's what everyone needs to know about onsite NAS versus cloud based storage.]]></description> <content:encoded><![CDATA[ <div class="wp-block-uagb-image uagb-block-50f58962 wp-block-uagb-image--layout-default wp-block-uagb-image--effect-static wp-block-uagb-image--align-none"><figure class="wp-block-uagb-image__figure"><img decoding="async" srcset="https://www.clouddatainsights.com/wp-content/uploads/2022/11/Depositphotos_8145342_S.jpg " src="https://www.clouddatainsights.com/wp-content/uploads/2022/11/Depositphotos_8145342_S.jpg" alt="" class="uag-image-2015" title="" loading="lazy"/><figcaption class="uagb-image-caption"><em>Companies are looking at cloud based storage choices. Here’s what everyone needs to know about onsite NAS versus cloud based storage.</em></figcaption></figure></div> <p>Even the largest enterprises are moving operations to a remote-friendly, hybrid workforce model—at least in part. Thanks to global disruption becoming the norm, companies have accelerated their plans to build composable infrastructure that can work any time from anywhere. These systems rely on stakeholders having access to tools and data they need to work even if they can’t get into the office. So when it comes to storage for that data, should you choose NAS or the cloud?</p> <h3 class="wp-block-heading">What is network attached storage (NAS)?</h3> <p>NAS is a file storage system that doesn’t need to be connected directly to a computer the way an external hard-drive does. Authorized network users can retrieve and store data and can access it from anywhere that connects to the network. </p> <p>NAS centrally manages data, making it faster and more efficient than traditional on-premises storage. Organizations can involve a single device or a cluster, giving enterprises more freedom to house the data they need without wasting space.</p> <p>It shines with unstructured data, providing companies with the space to manage complex data while maintaining the speed and efficiency of data retrieval. </p> <h3 class="wp-block-heading">Moving NAS to the cloud</h3> <p>Cloud-based NAS provides the same basic features of a private NAS, but shifts the responsibility of hardware and maintenance to the service provider. The enterprise relies on the surface providers NAS devices instead of their own, reducing the initial cost of equipment. </p> <p>This off-site file storage also expands the usage area for those working remote or in distributed enterprises. The service provider’s network can extend the company’s reach without needing to invest heavily in a large private network.</p> <p>Everyone is moving to the cloud, but storage may not need to</p> <p><a href="https://www.gartner.com/en/newsroom/press-releases/2022-02-09-gartner-says-more-than-half-of-enterprise-it-spending">Gartner predicts</a> that at least two-thirds of enterprise IT spending will be directed towards the cloud by 2025, However, shifting to Cloud NAS requires some consideration for what kind of data a company stores and how teams will use that data.</p> <h3 class="wp-block-heading">Cost is a factor for both choices</h3> <p>An onsite NAS system will cost companies more upfront, requiring sometimes heavy technology investments to build its infrastructure. Cloud-based NAS only requires that the company migrate their data; devices and hardware are the responsibility of the service provider.</p> <p>Continuing costs include maintenance and overhead. In addition, completely restructuring existing NAS systems could include additional costs because these systems aren’t as agile as what’s available in the cloud.</p> <p>Once everything is set up, costs begin to shift. Cloud uses different metrics to calculate costs. Storage itself may cost a single fee per month, but the act of accessing that data may drive up monthly costs later. However, the company won’t spend money on maintenance or overhead associated with updating servers.</p> <h3 class="wp-block-heading">Security and reliability are another factor</h3> <p>One myth about cloud operations is that on-premises systems are automatically more secure. That isn’t always the case. Cloud service providers make it their priority to stay up to date with the latest security threats and can take a more proactive, nimble approach to security than some in-house teams can.</p> <p>However, companies under heavy regulatory pressure may need NAS systems on-premises to ensure that they remain in compliance. Certain cloud services may not meet some aspects of data management protocols required of companies with highly sensitive data or those who contract with government departments.</p> <p>In terms of reliability, cloud NAS is certainly easier to set up and in its ongoing management. However, onsite NAS can provide decision makers with an ultra reliable connection built on an internal network that isn’t reliant on someone else’s equipment. </p> <h3 class="wp-block-heading">Choosing cloud versus onsite NAS</h3> <p>The decision is unique to each company. Here are some things to consider.</p> <p><strong>NAS Pros:</strong></p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li>NAS is often faster and more efficient than cloud thanks to a local network.</li> <li>Storage involves a single upfront cost rather than relying on continual monthly fees</li> <li>Companies with heavy regulatory responsibilities can better comply to those regulations</li> <li>Security and configurations are entirely within the company’s control.</li> </ul> <p><strong>NAS Cons: </strong></p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li>The NAS system often pushes available bandwidth to its limits, which can slow other actions throughout the network.</li> <li>Scalability isn’t as straightforward because companies must invest in the hardware upfront.</li> <li>It’s specifically designed for file storage. Companies will need other tools to build an infrastructure.</li> </ul> <p><strong>Cloud pros:</strong></p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li>Cloud scales up or down based on the company’s changing needs and with little effort on the company’s part and reduces upfront costs.</li> <li>Storing offsite in cloud servers provides a failsafe should the company’s network go down. It often includes a redundancy that further protects data assets.</li> <li>Cloud provides flexibility for access without the hassle of downloading or installations.</li> <li>Service providers offer their expertise in cybersecurity, deployment, maintenance, and upgrades</li> </ul> <p><strong>Cloud cons:</strong></p> <ul class="nv-cv-d nv-cv-m wp-block-list"> <li>While companies may save money in installation and upfront investment, usage costs may very depending on how the company processes and retrieves data.</li> <li>Choosing a service provider can lead to vendor lock-in.</li> <li>If there’s an outage, companies can’t take any steps to retrieve data or fix the problem themselves. </li> </ul> <h3 class="wp-block-heading">Cloud NAS versus onsite NAS depends on a company’s core priorities</h3> <p>Choosing traditional NAS systems or the newer cloud options is a matter of what priorities. The decision will depend a lot on how they plan to execute their data storage within the larger operational framework. For some companies, the option to scale in the future could far outweigh potential cost increases. Others may prefer to have total control over their data storage and access.</p> <p>Organizations may want to match their storage choice to the location of their connected tools—for example, Cloud NAS partnered with an app available only in the cloud—but this isn’t a given. It will be up to each company to fit NAS storage into its overall data ecosystem in the say that makes the most sense for operations.</p> <div class="saboxplugin-wrap" itemtype="http://schema.org/Person" itemscope itemprop="author"><div class="saboxplugin-tab"><div class="saboxplugin-gravatar"><img loading="lazy" decoding="async" src="https://www.clouddatainsights.com/wp-content/uploads/2022/05/Elizabeth-Wallace-RTInsights-141x150-1.jpg" width="100" height="100" alt="" itemprop="image"></div><div class="saboxplugin-authorname"><a href="https://www.clouddatainsights.com/author/elizabeth-wallace/" class="vcard author" rel="author"><span class="fn">Elizabeth Wallace</span></a></div><div class="saboxplugin-desc"><div itemprop="description"><p>Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain – clearly – what it is they do.</p> </div></div><div class="clearfix"></div></div></div>]]></content:encoded> <wfw:commentRss>https://www.clouddatainsights.com/cloud-basics-nas-versus-the-cloud/feed/</wfw:commentRss> <slash:comments>0</slash:comments> <post-id xmlns="com-wordpress:feed-additions:1">2014</post-id> </item> </channel> </rss>