The Unbundling vs Rebundling of a Data Stack Debate Missed The Point

Photo by Wendy Wei from Pexels

Many tools in the contemporary data landscape started to become increasingly polarized. On the one hand, there are products highly specialized in one specific area, such as data ingestion, transformation, scheduling, cataloging, experiment tracking, alerting, etc. On the opposite side of the spectrum, there are tools that attempt to re-bundle all the pieces of the data stack as part of their single product.

Either-or decisions

Tools that fall into those extreme categories tend to see the world in black and white and force you to make either-or decisions. Either you choose a single product to manage the entire end-to-end lifecycle of your dataflow, or you lose data lineage and observability. Either you switch to a single product and vendor, or you end up with data stack fragmentation and chaos.

Coordination plane instead of a control plane

Full disclosure, I work at Prefect. Just yesterday, we launched Prefect 2.0 — a product that evolved, a.o., as a result of acknowledging that problem.

Prefect 2.0 provides an alternative to either-or decisions: a coordination plane that can simultaneously be used to orchestrate your dataflow and observe the state of your data stack living outside of that orchestration. You don’t have to change how you work and adjust your data stack only to gain the benefits of orchestration, lineage, and observability. You can have the best of both worlds without compromises.

This article explains how that’s possible:

The above post is not just an announcement of Prefect 2.0, which was launched yesterday, but one of the most balanced perspectives on the future of the data stack I’ve seen so far. Here is how the author describes the problem:

“It’s unreasonable to presume that a single orchestration plane will ever be able to control dataflow across an entire stack. […] Not only would this be an extraordinary waste of time, but this forced re-bundling of every data application back into the orchestrator would be profoundly un-modern.”

The proposed alternative is centered around observing the state of your data stack (regardless of which tools you end up using!) and leveraging the state observed by your coordination plane to drive orchestration:

“It turns out that the solution to this problem isn’t to redefine and force all dataflow to pass through the orchestrator, but rather to enable a more passive mode of collecting information, one that lets the software observe dataflow as it moves through the stack without controlling it.”

With that approach, you can coordinate any (not only modern) data stack. You can scale and adapt your data platform to unknown future needs without switching costs and compromises. You can finally coordinate work between teams without having to agree on a single monolithic orchestration plane that would only lead to conflicts, friction, and unhealthy tradeoffs where everyone loses. And you don’t even have to think about unbundling or rebundling. You don’t have to choose.

Next steps

I highly recommend giving this article a read regardless if you ever decide to use Prefect or not. Our Community Slack has a dedicated channel called #best-practices-coordination-plane if you want to exchange ideas about the topic.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anna Geller

Anna Geller

Lead Community Engineer at Prefect, Data Professional, Cloud & .py fan. www.annageller.com. Get my articles via email: https://annageller.medium.com/subscribe