Member-only story

Modular Data Stack — Build a Data Platform with Prefect, dbt and Snowflake

This is a conceptual post about building a data platform with Prefect, dbt, and Snowflake — hands-on demos follow later

7 min readOct 11, 2022

Scheduling is just one aspect of the dataflow coordination spectrum

Orchestration platforms historically only allowed managing dependencies within individual data pipelines. The typical result was a series of DAGs and brittle engineering processes. Those massive DAGs were difficult to change. To avoid breaking anything, you would think twice before touching the underlying logic.

Today, data practitioners, especially data platform engineers, are crossing the boundaries of teams, repositories, and data pipelines. Running things on a regular schedule alone doesn’t cut it anymore. Some dataflows are event-driven or triggered via ad-hoc API calls. To meet the demands of the rapidly changing world, data practitioners need to react quickly, deploy frequently, and have an automated development lifecycle with CI/CD.

DAGs are optional

DAGs can no longer keep up with the dynamic world of data and modern APIs. Moving from static DAGs (including DAGs of datasets defined in static declarative code) to API-first building blocks makes your dataflows more adaptable to change and easier to deploy and manage. In Prefect, you can still build DAGs, but they are optional — you can use a DAG structure only when you need it, and it can be dynamically constructed from runtime-discoverable business logic.

In this series of posts, we’ll build a data platform Proof of Concept using Prefect, dbt, and Snowflake. The “M” in the MDS we’re talking about here is focused on modularity as well as proven engineering concepts rather than modern hype. We’ll demonstrate how you can start from a simple, local, parametrized data pipeline running dbt CLI commands to multiple (but still simple) observable ingestion and transformation flows deployed with Docker.

Once we’re done with the development stage demo, switching between development and production environments will be as simple as pointing to a different Prefect Cloud workspace from your terminal.

Problems data practitioners still struggle with despite the Modern…

Modular Data Stack — Build a Data Platform with Prefect, dbt and Snowflake

This is a conceptual post about building a data platform with Prefect, dbt, and Snowflake — hands-on demos follow later

DAGs are optional

Problems data practitioners still struggle with despite the Modern…

Written by Anna Geller

No responses yet