The innovations made possible by Generative AI are incredibly promising, which is why the market now stands at nearly $9 trillion. Since the technology hit the mainstream, organizations have been clamoring to build and implement AI tools that drive new levels of efficiency and innovation. In order to do so, these organizations require massive amounts of data. It also requires compute power made possible only through the use of graphics processing units (GPUs), which are processors that were initially designed for graphics processing (hence the name) but have since been modified to support the floating point calculations needed to train and host models.
See also: Why People Choose Public, Private, Hybrid, or Multi-Cloud Solutions
Public cloud providers saw this GPU need and their limited availability and bought all the GPUs they could from the likes of Nvidia, AMD, and Intel. As a result, most organizations turned to these public cloud providers to train and host their AI models and the applications that depend on them. This had the effect of turning the investments needed to create an AI data infrastructure into an operational expenditure as opposed to a capital expenditure if that same infrastructure were built on-premise.
Operational expenditures, in the context of an AI data infrastructure, are optimal when you want to get started quickly and you do not need a lot of compute and storage. However, as your business grows you will need more resources and it is basic economics that tells us that at some point, operational expenditures exceed the cost of purchasing the resources being used. Let’s call this resource limit the “operational breaking point.”
Today, GPU availability is improving. Control is becoming more important as data privacy and AI safety become more important. Finally, many organizations have reached their operational breaking point. The result – a resurgence of the private cloud.
The Cost of Public vs Private Cloud AI
The fact is that the success of the public cloud is built on availability, convenience, and elasticity. It was not and is not built on economics. The cost of the public cloud is now much more than what companies originally bargained for years ago when they invested in the solution. Back in 2021, S&P research found that once you hit a certain level of scale, the private cloud became cheaper than the public cloud based on a combination of labor efficiency and utilization. At the time, it did not seem likely that many large-scale migrations would occur. That was until AI hit the industry. Now companies are rapidly scaling workloads to meet the demand for AI, and private cloud is becoming the new operating model of choice.
The public cloud has standardized the adoption of seeing the cloud as an operating model, rather than a destination. The cloud operating model, or cloud-native, means you build for portability and prevent your services from getting locked into one cloud vendor. This can be hard to do – all public clouds have hundreds of custom services they use to lock customers to their platform. With the cloud operating model, forward-thinking organizations realize that what you start in the public can be moved to the private cloud. This is a much more cost-effective solution.
The Cloud Operating Model for AI is Private
With the GenAI adoption surging 17% in 2024, we are now seeing companies looking for new solutions to data management challenges that have popped up accompanying that sharp increase. More and more CEOs and boards are coming to the same conclusion—the importance of AI to their organizations is existential, and therefore, it is essential that they address these challenges. To do so they need control over their data, the compute needed for model training, and the models themselves. Large enterprises in regulated industries especially want to keep their data close to the chest. That very data, after all, is their secret sauce.
With data privacy and security concerns associated with AI, more regulations are designed to hold organizations accountable for the data they collect. his point of control is quickly becoming imperative.
By moving to a co-location facility or on-premise deployment, enterprises can own the full cloud stack and better control costs and data. The public cloud may be a good place to start, but it is not a place to stay long term.
The best of both worlds is a hybrid model. This model allows organizations to prepare data for processing within a private cloud, “burst” to the public cloud to run processing on rented GPUs and then bring the data back to the private cloud once completed. While these hybrid models can be effective, it’s becoming clear that the private cloud, whether colocation or on-prem datacenter, is the path forward for AI that the CTO, CIO and CFO can all agree on.
Keith Pijanowski is MinIO’s subject matter expert for all things AI/ML where he researches and writes about storage requirements for AI and ML workloads. Keith has extensive experience in the software space, most recently as an enterprise architect on BNY Mellon’s distribution analytics team building data pipelines and analytics solutions. Prior to BNY Mellon, Keith spent more than a decade at Microsoft where he served in a number of different developer evangelism and business roles. He was one of the first members of Microsoft’s evangelism team when the .NET framework was first released.