The modern data stack is quite complex. Developers spend great amounts of time trying to get multiple tools to work together when building data pipelines. The processes are inefficient, do not scale well, and require massive compute power. Thus, the need for a post-modern data stack.
RTInsights recently sat down with Tom Weeks, COO of Ascend.io, to talk about the issues developers have when building data pipelines using modern data stack solutions, what’s needed to overcome these issues, and how Ascend.io can help.
Here is a summary of our conversation.
RTInsights: Why are companies moving away from the modern data stack?
Weeks: There are a couple of ways to look at this. Let’s put it in the context of the pain points that data teams are feeling. Why are they frustrated with the modern data stack and looking for alternatives?
The first challenge is the modularity. Each major distinct capability that’s required to build data pipelines demands a different tool — leading to an environment with separate interfaces and user experiences that operate differently. For instance, organizations require individual tools for data ingestion, transformation, orchestration, reverse ETL, data quality checks, and other observational capabilities.
This results in an often-overwhelming environment for data and analytics engineering teams, with a multitude of tools to manage, slowing down the process of building new pipelines and increasing the complexity of troubleshooting when issues arise. This environment also forces specialization because you have to understand the unique nature and how that one tool works, which can further slow down operations when key team members are unavailable.
The second challenge stems from the lack of integration of these systems. The different tools used to build data pipelines barely communicate with each other, necessitating custom software and even more specialist skills to bridge these gaps. This slows everything down even further. We call this the ‘integration tax.’ It’s like a roadblock that adds more friction to the system, leading to greater inefficiency.
Finally, the third challenge causing dissatisfaction with the modern data stack is cost. We’re in a space where the players who have become the winners in each of these functional areas did so by entering a very fragmented space with either an open source or a very inexpensive product to ease friction. However, as these companies look to build a business, there have been significant price hikes. They are now trying to extract as much rent as they can for their part of the stack. This has led customers to question the value proposition, especially as their initial expenditure has now escalated to untenable levels.
RTInsights: What is the post-modern data stack?
Weeks: The post-modern data stack refers to a shift in how we approach data management, not dissimilar to transformations we’ve seen in other fields. For instance, consider aircraft control systems. Initially, separate tools managed each component, like flaps, engine thrust, and ailerons. However, over time, the system evolved into an integrated platform with a single user interface.
The same shift can be seen in infrastructure management and website development. In both, numerous tools initially managed different aspects, but eventually, the trend moved towards a single, unified platform. Kubernetes, for instance, provides a one-stop interface for infrastructure management.
In the context of data pipeline automation, the post-modern data stack is expected to follow this trend. The goal is to evolve from using numerous interfaces for different tasks to a single, consolidated platform. This means if a pipeline needs to be built or fixed, it can all be managed within one unified interface.
This approach aims to simplify the user’s interaction, much like a pilot’s interaction with a modern aircraft. Pilots no longer need to individually manage flaps or rudders; instead, they simply input their flight parameters, and the system takes care of the rest. Similarly, in the realm of data management, the post-modern data stack will allow users to focus on their desired outcomes rather than the intricate details of the individual tools and processes. This is the driving concept behind the post-modern data stack.
RTInsights: What are the key characteristics of the post-modern data stack? How does it differ from the modern data stack?
Weeks: Firstly, consolidation. We can expect to see an expansion of the services provided by individual players in the data stack market. This development is already in motion, driven by customer pressure.
Once you get consolidation, the second key characteristic is the comprehensive automation layer.
For the post-modern data stack, this means the ability to generate substantial metadata detailing every step of the process. With this insight, we can develop software that automates a large portion of what a data engineer does – often the more routine, mundane tasks. This is akin to the automation transformation in car factories, where automation significantly streamlined the workforce, allowing workers to oversee operations.
Companies like Ascend, and there will be others without a doubt, are leading this change. As we consolidate our offerings, we can leverage the holistic view of the data process to enhance automation. For instance, when a new batch of data arrives, the system can automatically manage its propagation through the transformation process and any subsequent processes like reverse ETL.
I can do that for the developer because I can see it and I can manage it for them. You can’t do that in the modern data stack today. Those fragmented tools work differently using different code bases. They don’t know any of this metadata. They don’t know anything about what’s happening on the other sides, either left or right of themselves.
So, the post-modern data stack is the next evolution.
RTInsights: What are the business benefits of the post-modern data stack?
Weeks: There are three different aspects to it.
Firstly, it can significantly reduce software costs. By consolidating numerous tools into a single platform, businesses can access richer functionality at lower costs. We conducted a study with an external firm and calculated that the total cost of implementing a single platform can be about 37% of the cost of maintaining individual tools.
Secondly, the post-modern data stack can greatly enhance efficiency by reducing the compute resources required. The integration of an intelligent control plane can potentially halve the amount of processing required by systems like Snowflake, Databricks, BigQuery, and others.
This reduction is possible because the control plane can comprehend the exact state of all interconnected pipelines, similar to a monitoring system in a car factory. For instance, if a change is introduced at step 32 of a 40-step pipeline, a control plane can start processing from step 32, knowing that steps 1 to 31 are already completed. This level of detailed understanding is absent in the modern data stack. This benefit extends to “restartability,” where if a pipeline breaks at a particular step, you can fix the issue and continue processing from that point rather than restarting the entire pipeline.
Lastly, the third benefit is the sheer productivity gains. The post-modern data stack streamlines the workload for developers, freeing up their time to focus on more complex tasks. By automating the more mundane aspects of data processing, developers can build more pipelines in a given timeframe. In this way, the post-modern data stack optimizes not only financial and compute resources but also human capital.
RTInsights: How does Ascend.io help in all of this?
Weeks: Ascend provides a single, unified platform that streamlines and simplifies processes. Instead of juggling five, six, or even seven different tools and interfaces, users can access all necessary capabilities through a single “pane of glass,” vastly improving efficiency and reducing complexity.
A key advantage of Ascend’s control plane-based system is its capacity to enhance developer productivity. On this platform, a developer can build seven to ten times more pipelines in a given period compared to using a fragmented set of multiple tools. We can build a pipeline in five to 10 minutes that would take somebody two to three to four hours to do in the modern data stack.
In essence, Ascend helps by providing a powerful, integrated solution that vastly accelerates data pipeline building while simplifying the user experience.
Salvatore Salamone is a physicist by training who has been writing about science and information technology for more than 30 years. During that time, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.