Center for Data Pipeline Automation
Learn More

The Technology Behind and Benefits of Data Pipeline Automation

A chat with Sean Knapp, founder and CEO of Ascend.io, about the challenges businesses face with data pipelines and how data pipeline automation can help.

In today’s data-driven world, businesses rely on efficient data pipelines to gain insights and create data products. Unfortunately, traditional manual techniques for building and maintaining these pipelines struggle to scale, leading to increased complexity and reduced productivity.

RTInsights recently had a chat with Sean Knapp, founder and CEO of Ascend.io, about the challenges businesses face in this area and how data pipeline automation can help.

Here is a summary of our conversation.

CDInsights: Why is there so much interest in data pipelines today?

Ascend.io’s Sean Knapp

Knapp: Data pipelines are garnering attention because we’re witnessing a significant industry shift. About 12 years ago, Marc Andreessen famously said, “Software is eating the world,” and that every company would ultimately become a software company.

Indeed, that has happened. So, what came next? Well, software excels in two areas: generating vast amounts of raw data and consuming smaller versions of refined, intelligent data to provide better products and services.

Fast-forward to today and data has evolved from being a mere byproduct of software to becoming the product itself. Data products power the next wave of innovation in both new and old companies, regardless of industry, and data pipelines are the backbone of those very data products. They enable businesses to move, transform, and add value to their data, ultimately driving better decision-making, innovation, product, and services for consumers and customers.

See also: Data Pipeline Pitfalls: Unraveling the Technical Debt Tangle

CDInsights: What are the challenges with the way organizations have traditionally built pipelines that are limiting businesses today?

Knapp: Traditional data pipelines struggle to scale, not just in terms of data volume, velocity, variety, or veracity, but in terms of complexity. These pipelines are often brittle, built with loosely connected systems that are generally wired up by humans. It’s like having old-school telephone operators connecting calls manually, which eventually becomes unsustainable.

This challenge parallels what happened in software development. As engineers built interdependent products, complexity skyrocketed, driving the need for more scalable and automatable solutions.

Similarly, for data pipelines, we now have access to advanced technology and skilled engineers, but the network effect of building interdependent pipelines without sufficient automation is stifling productivity. As more pipelines are added, team efficiency approaches zero, and that’s a significant concern for businesses today.

CDInsights: How does data pipeline automation help?

Knapp: Data pipeline automation identifies common patterns and integrates them into the underlying system, similar to how automation has evolved in other areas of technology. We no longer need to manually defrag hard drives, optimize code, or worry about application scalability in the cloud era.

Data pipeline automation allows users to work with powerful data platforms like Snowflake, Databricks, or BigQuery and creates an abstraction layer that leverages the best practices across the industry to automate complex tasks — meanwhile, data teams can elevate their work, concentrate on innovative data products, and drive better business outcomes.

CDInsights: What are the technical elements needed to automate data pipelines?

Knapp: When it comes to automating data pipelines, there are two key technical elements. First, you need an end-to-end approach that unifies data pipeline capabilities. Think about it as a single pane of glass for data ingestion, transformation, orchestration, and sharability. Second, you need access to abundant metadata that fuels your automation.

At Ascend, we combine these two elements to unlock the key to data pipeline automation: fingerprinting. We bind code to data and data to code, and we ensure code and data are properly aligned with our continuously running control plane — a continuously running scheduler with data integrity checking that tracks the lineage of all of your data thousands of times a second.

With those secure hash-based fingerprints, Ascend instantly detects changes, providing the foundation for automation. By being aware of these changes, the system can leverage context to automate processes efficiently. As a developer, you can trust Ascend’s automation engine to handle the impact of changes, freeing you to focus on data and its applications. This is the beauty of automated systems – you can rely on them to produce the right results while concentrating on more strategic aspects of your work.

CDInsights: What are the benefits of using data pipeline automation?

Knapp: Data pipeline automation offers several key benefits. First and foremost, you’ll see significant productivity gains. You’ll build faster, spend less time maintaining pipelines, and create more data products. This means you can rapidly expand the scope and impact of your work.

Secondly, automation can help lower costs. Automated systems have evolved to the point where they now outperform what most of us can do manually in terms of efficiency and cost reduction — moreover, they can continuously adapt and optimize, which is a significant advantage over one-time tweaks for performance gains. Automated systems like Ascend collect vast amounts of metadata and optimize processes beyond what developers can manage manually. For example, Ascend can fingerprint code and data, identify reusable blocks, and reduce storage and processing costs by reusing previously calculated data.

Lastly, automation expands the pool of developers who can participate in the data ecosystem. It reminds me of the evolution in the last 15 years in the software space: we have backend developers, frontend developers, full stack developers, and product developers, of all these new people who can participate in the software ecosystem.

In the data ecosystem, we’re now seeing analytics engineers, BI engineers, software engineers, and data engineers with diverse backgrounds and skill sets contributing to data pipeline development. By focusing on the data and its production, automation enables these professionals to collaborate more effectively, pushing data to the forefront and creating a more efficient, unified environment for data-driven decision-making.

CDInsights: Can you discuss some customer examples or general use cases?

Knapp: We’re seeing incredible customer success stories across a wide range of industries, such as healthcare, retail, finance, and media. Going back to one of my earlier comments, the common thread is that every company is now a data company, and they’re leveraging data in remarkable ways.

Take the New York Post, for example. They use Ascend to analyze user behavior on their website, such as ad clicks, package purchases, and content engagement. This information helps them recommend better content to their readers.

Another exciting example is Steady, a company that provides income stabilization for gig economy workers. By analyzing financial patterns from various data sources, Steady creates profiles that help workers achieve a more stable income.

Afresh is yet another innovative use case. They’re reducing food waste by creating predictive models to determine the right amount of food needed in stores at specific times, minimizing overhead and waste.

I can go on and on. But these examples truly showcase the power of automation in the world of data. And the amazing part about it is that each one of these companies is applying data automation to its unique world (and they are very different worlds). Yet, they find common ground in automation to accelerate their progress.

Leave a Reply

Your email address will not be published. Required fields are marked *