Scaling Up: How Multi-Tech Data Platforms Enhance Data Management

Most organizations today cannot lean on just one or two data management solutions. What’s needed is a multi-tech data platform that ensures performance, security, and more.

Sponsored by Instaclustr

Modern data volumes and velocities have outpaced the capabilities of traditional relational database and data management solutions. Making matters even more challenging, many organizations find they need multiple modern data solutions to support events, streaming data, and more.

RTInsights recently sat down with Andrew Mills, a Senior Solutions Architect at NetApp Instaclustr, to talk about these issues, what technologies are needed, and how multi-tech platforms can help.

Here is a lightly edited summary of our conversation.

RTInsights: What data management challenges are businesses dealing with today?

Mills: I like to break that down into a few groups. One of the most difficult ones is data volume and velocity. Data is flowing at organizations at a very significant clip, and they must make decisions around how to ingest, process, and store those data. Data comes from many different sources, which often presents data quality and consistency issues as well.

Then, you have governance and security. Who can see the data? How are we going to keep it secure? What is the lifecycle of your data? Do you have stuff that is automatically deleted after a day, a week, a month, a year, or seven years? What are the cost implications for maintaining data for those durations?

Those are the high points for the challenges. Addressing those challenges requires a combination of advanced technologies, personnel who are skilled with those technologies, and robust processes and policies to help navigate the waters and point people in the right direction for how to deal with it.

RTInsights: Why are data management approaches that previously worked failing now in light of these challenges?

Mills: When I started my career 15+ years ago, most organizations could store their data in a relational database and build logical relationships. You could put indexes and views in place to fetch the data quickly and rely on the access controls built into the database for security. You could assign permissions around certain databases, tables and views. Some databases offered the ability to manage column-level permissions as well, which was nice. Even though it seemed complex at the time, that was pretty simple to deal with considering the landscape today.

Now, we’re dealing with structured, semi-structured, and unstructured data at high volume and velocity, and those relational systems are not able to keep up. They’re not distributed, and they don’t easily scale. High availability and disaster recovery, with reasonable RTO/RPO, become a concern as well. There are definitely still use cases for relational databases, but they’re much more specific.

RTInsights: In general, what’s needed?

Mills: Fundamentally, many organizations should treat data more like a product. Once the mindset changes on data, you start adding process and structure around the data. Instead of just DevOps and Developers, you have a Data Engineering role. This role becomes the expert around your data and can work with teams across the organization to understand their needs. Importantly, they can address things like schema evolution, security controls, governance, and access. They can be responsible for understanding costs around storage and retention and their value to the business. A role focused on this new data product allows you to look at it a little differently, which is critical.

Then, you start getting into the actual technologies and how you have to adjust to the types of data and the types of requests for the needed data.

If you think about it from a traditional product perspective, if I’m building an application that runs on a Windows or Mac computer, I’ve got lots of layers, right? I’ve got the user interface and how that’s coded for where it’s running. Then I’ve got APIs on the backend, and those might be coded in a different language. Then, you’ve got databases that support the API and UI. So, you have all of these various layers of technologies that would support a traditional app. When you begin to look at data as a product, you’re going to have that same type of layered effect with various technologies that solve distinct problems.

A common enterprise architecture that has become popular in recent years is Event Driven Architecture (EDA), where data is a stream. There’s this ever-present inflow of data into your organization, and you adopt a technology that allows you to save it in a durable way and then distribute it across different platforms. You’re still going to need relational databases, NoSQL databases, data lakes/oceans, OLAP databases, and/or search technologies like OpenSearch. Obviously, the exact needs of each organization will be different, but this is the landscape today.

As an expert in the data, you understand what data is being written and how it’s being accessed. That’s important because the choice of where you store the data and how you choose to retrieve it needs to be considered carefully. Most organizations today cannot lean on just one or two solutions. From my experience, one of the biggest pitfalls is when a company tries to use a familiar technology to solve a problem that isn’t the core competency of that technology. You end up with poor performance and having to redesign a few years after you’ve gone into production, which can be costly.

See also: Future-proofing Your Data Strategy with a Multi-tech Platform

RTInsights: What does Instaclustr offer, and how does its multi-tech data platform help?

Mills: Instaclustr is NetApp Instaclustr, and I say that with intention. NetApp has been a leader in the storage space for many decades. When you talk about Netapp, it’s not just storage; it’s data protection, security, visibility, and performance optimization on-premises, in private clouds, and now in the public cloud. NetApp is the only storage company to have first-party storage in all the major clouds – AWS, Google Cloud, and Azure. You can go into AWS and provision FSxN, Cloud Volumes ONTAP in Google Cloud, or Azure NetApp Files in Azure. Each of those storage solutions come with industry-leading capabilities, which is just the NetApp piece.

When we discuss Instaclustr specifically, the team is all-in on open source, and we have taken a curated approach to the technologies we offer. We want to help you solve each problem a data platform could have, so we don’t offer solutions that solve the same problem. We can talk about the nuance another time, but we support:

Apache Kafka, which is great for events and real time streaming.

Apache Cassandra, which is a distributed NoSQL DB that excels at high-volume writes and storing lots of data.

PostgreSQL, which is one of the most popular relational databases currently on the market.

OpenSearch, which was forked from Elasticsearch back in 2021, is a great resource for a number of search functions on very large amounts of data.

Valkey, which was forked from Redis in 2024, is a high-performance in-memory cache.

ClickHouse, which is a column-oriented database management system for analytical workloads.

Cadence, which is a workflow orchestration platform to help manage complex business processes. This tech is outside of our traditional scope. Under the covers, ClickHouse uses Kafka and Cassandra, and one of our biggest customers was using it in a big way. They asked us if we’d add it to our stack and host it for them, to which we obliged.

Lastly, we have Spark, which is another one that’s outside of the traditional database world. We have a product called Ocean for Apache Spark. We leverage the open source Spark repository, but we have our own controller that allows it to run on a technology called Ocean, which is an advanced Kubernetes autoscaler.

The suite of products in our platform enables you to have one vendor who has depth and breadth across a stack of products that can, for the most part, get you what you need for your enterprise.

We deliver help for these services in three ways:

Managed Platform, where we can host the infrastructure for you (SaaS), manage it in your cloud account (BYOC), and/or on-prem. We see lots of customers with hybrid environments, on-prem and in the cloud, as well as a multi-cloud approach. Our single control plane helps make that really easy.

Support, with the expertise we have gained operating these tech’s for our platform customers, we can help you operate them at a high level, too. One of the most challenging things about open source is there is no bat phone to call when it’s 3 am, and you’re in the middle of a crisis. We can be that resource.

Consulting, we have deep expertise across the board in these technologies. The team that provides support is focused on operational excellence, and some customers need hands-on keyboard help, architecture reviews, or best practices on using these technologies from the client. That’s where our Consulting team shines.

RTInsights: Can you give some examples?

Mills: We’ve got many customers who leverage multiple technologies. As I mentioned with Cadence, we have a customer who uses Kafka, Cassandra, and Postgres, and they came to us and said, “Hey, we’re using Cadence, and we’d like you to manage it for us.” So, we looked at the tech, built out a team, and started managing it for them. Now, it is one of our generally available offerings. That story is not uncommon.

We have customers who’ll join with one product, and over time, we’ll hear, “You’re doing such a good job with my Kafka. I’d like you to take over management of this other technology.” Or they’ll ask us, “Hey, do you support this one, too?” If the answer is no, the next thing is, “Well, what will it take to get you to support it?” We then look at the tech from a variety of different angles, like licensing, marketing, and our own availability to take on a new tech. We always want to make sure we do it right, not halfway.

Another one that really resonates happened within the last year and a half. An organization that services manufacturers has an ERP system and provides supply chain solutions. They came to us when they had a new initiative, building a new integration platform and planning to use Kafka, Cassandra, and OpenSearch.

They had an enterprise architecture and asked us to take a look. I sat down with them and three members of our consulting team, who are experts in those technologies. We looked at their diagram, talked for several hours, and essentially came back and said, “All right, this is good overall. You need to consider this, this, and this for Kafka, and the way your events are flowing will cause issues here and here.”

We dug into their schema and their data, the way they were going to be writing the data, the tables, and stuff in Cassandra, and we said, “You’re going to have these problems here. Consider refactoring to look like this.” We talked to him about OpenSearch and the best way to do sharding and sizing. We were able to talk through those technologies and from our practice and experience and suggested a best path forward. After that exercise, they made a huge pivot.

With issues like that, we have so much experience that we can really accelerate time to market. And that’s what we did with this organization. They came to us after that consulting experience, and they said, “Okay, we’re ready to hop on your platform.” Today, they’re running Kafka, Cassandra, and OpenSearch on our platform. That allowed them to focus on building the integration platform, not trying to gain expertise around these specific technologies. They just lean on us for that, and they really couldn’t be happier with us.

Leave a Reply

Your email address will not be published. Required fields are marked *