Modular Data Stack — Build a Data Platform with Prefect, dbt and Snowflake

This is a conceptual post about building a data platform with Prefect, dbt, and Snowflake — hands-on demos follow later

Anna Geller
7 min readOct 11, 2022
Scheduling is just one aspect of the dataflow coordination spectrum

Orchestration platforms historically only allowed managing dependencies within individual data pipelines. The typical result was a series of DAGs and brittle engineering processes. Those massive DAGs were difficult to change. To avoid breaking anything, you would think twice before touching the underlying logic.

Today, data practitioners, especially data platform engineers, are crossing the boundaries of teams, repositories, and data pipelines. Running things on a regular schedule alone doesn’t cut it anymore. Some dataflows are event-driven or triggered via ad-hoc API calls. To meet the demands of the rapidly changing world, data practitioners need to react quickly, deploy frequently, and have an automated development lifecycle with CI/CD.

DAGs are optional

DAGs can no longer keep up with the dynamic world of data and modern APIs. Moving from static DAGs (including DAGs of datasets defined in static declarative code) to API-first building blocks makes your dataflows more adaptable to change and easier to deploy and manage. In Prefect…

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

No responses yet