If you ask most IT pros how to modernize data storage, there’s a good chance they’ll mention the cloud. The cloud has been a key component of IT strategies for years, and it’s poised to grow only more important going forward. Gartner predicts that 75 percent of workloads will run in the cloud by 2028, for example, while IDC projects a compound annual growth rate in the cloud market of nearly 20 percent over the next three years.
From the perspective of unstructured data storage, migrating to the cloud has the potential to deliver major benefits. Cloud storage is infinitely scalable, for example. It can also increase data availability and allow organizations to take advantage of a wide range of cloud-native services, including analytics and AI.
Yet, despite the surging popularity of the cloud, simply taking all your on-prem file and object data and migrating it to standard cloud storage is not ideal. With so many storage tiers now available, it’s vital to understand the differences between unstructured data migration and data tiering and to consider a mixed approach that is driven by analytics.
Cloud data migration vs. cloud data tiering
First, let’s review the differences between cloud data migration and cloud data tiering of files. In a later article, we’ll tackle migrating on-prem objects to the cloud.
Cloud data migration means taking data that is currently stored on-prem and moving it to a cloud storage service (like Amazon EFS or Azure Files) that makes the data instantly accessible from the cloud. Cloud data migrations may occur when it’s time to refresh storage and as part of an overall move-to-the-cloud strategy. Migrating data to the cloud has at least two purposes. One is to leverage cloud file systems and run applications in the cloud. This delivers the same basic levels of data performance and availability as on-prem but with the added benefit of more scalability than on-prem storage typically offers. In addition, businesses that use cloud storage pay only for what they consume, so if they scale back later, they’re not stuck with the storage infrastructure they purchased but no longer need. The other purpose is to use the cloud as an offline archive, using low-cost object storage like Amazon’s S3 Glacier and Glacier Instant Retrieval.
In contrast, cloud data tiering is the process of continuously offloading older, cold data that has not been accessed in months to cloud storage services. Tiering creates an “online archive” in the cloud in which files still appear to be on-prem and can be accessed by simply double-clicking on them. Archival storage like Amazon’s Glacier Instant Retrieval costs much less than standard S3 storage. Because tiering is continuously moving older data to the cloud, it reduces the amount of expensive, high-performance storage you need on-prem as well as the amount of backup storage required, thus reducing storage costs by as much as 70%.
Next, let’s review how to get the most from whichever strategy you adopt.
See also: Cloud Migration: Enabling Innovation
Charting a cloud data migration strategy
Here are the key considerations:
- Usage: This is appropriate when moving on-prem file servers to cloud file servers and for offline archive of files to cloud object storage. For the latter, a file-to-object migration solution will be needed, which we’ll cover in our next article.
- Pre-assessment of data: It’s important to use an analytics-first approach to identify what should be moved to the cloud and what should be deleted or archived. This will reduce cloud costs and migration times and ensure that you are choosing the right strategy for the right data sets at the right time.
- Pre-assessment of environment and network: Too often, migration performance is extremely poor due to bottlenecks in the on-premises infrastructure and associated network settings. Some migration solutions provide a tool that runs standard tests to identify bottlenecks within your environment. This can fundamentally improve the success of your migration project.
- Performance: Migrating large volumes of data, especially lots of small files, to the cloud can be painfully slow due to high-latency WANs, especially if migration depends on chatty network protocols like SMB to transfer data. Look for solutions designed to work over WANs and improve file transfer times. Network bandwidth limitations and outages can also impede the performance of data migration, and some file attributes or metadata can be lost in the process of moving data from on-prem to the cloud. Look for solutions that provide re-tries in the event of network issues and that perform a checksum test to ensure that all the bits of each file have been properly transferred.
- Security: If you migrate data over the network, you’ll want to ensure the data is encrypted in transit to prevent eavesdropping. In addition, it’s important to configure proper access controls once your data is in the cloud to prevent data leakage or exfiltration.
Cloud tiering considerations
Below are key aspects of tiering that can make or break the cost savings you achieve:
- Block vs. file-level tiering: Conventionally, storage vendors provide block-level tiering. This is ideal for system data such as snapshots but has drawbacks when migrating regular user and application data. Because files are stored as proprietary blocks, they cannot be accessed natively from the cloud. Special software sold by the vendor is required. Also, when you need to replace the on-prem file system, all the data tied to it will have to be rehydrated. You’ll need to purchase adequate capacity on the existing file server to hold the rehydrated data, followed by a migration of the rehydrated data to the new file server. Then, you’ll need to tier the cold data back to the cloud. This can be daunting if you have tiered petabytes of data, and it will be expensive due to egress fees and cloud API costs. File-level tiering, in contrast, tiers the entire file, which can be accessed natively from the cloud for use in AI and other cloud applications. Rather than rehydration, file level tiering, available in some unstructured data management solutions, will allow the tiered files to be accessible from the new file server without having to rehydrate all the tiered data. This is a huge advantage that should not be overlooked.
- Transparency: Tiering should provide transparency so that users can access their data by simply double-clicking on what appears to be the file in the on-prem file server yet redirects to the location to which it was tiered. Transparency allows IT administrators to tier cold data automatically and continuously without disrupting their users and making them hunt for data that has been moved. The ability to still search for and access files from the original file server is why transparent tiering is said to create an “online archive.”
- Bulk recall: When needed, tiering solutions should allow you to recall data en masse. If a revision of a project whose data has been tiered is needed, rather than restoring files as they are required, you should be able to recall all the files ahead of time for the best performance.
Conclusion
Cloud data migration is great if your goal is to reduce on-premises storage capacity, adopt new storage technologies and increase investments in the more flexible, on-demand nature of cloud storage. Data tiering is better in cases where you want to lower storage costs and capacity for data that you access infrequently—but which you may still need to recall on-premises in the future.
Kumar Goswami is the CEO of Komprise.