Member-only story

How to Build Modular Dataflows with Tasks, Flows and Subflows in Prefect

And how to define state and data dependencies in your Prefect data pipelines

Anna Geller
8 min readSep 19, 2022

Prefect is a coordination plane for the constantly evolving world of data. Using Prefect, you can orchestrate any dataflow and observe everything else happening around it. By observing what’s going on in your data stack at any given moment, Prefect gives you visibility into the current state of your dataflow and allows you to dynamically react to it using orchestration.

Prefect has a rich vocabulary for defining your orchestration logic, including tasks, flows, and subflows, as well as state and data dependencies. Your dataflow works consistently regardless of whether you run it in a local process, async loop, or on a remote Dask or Ray cluster. This post will dive deeper into the core building blocks and explain how they can help you build modular and extensible dataflow.

⚠️ Note: I no longer work at Prefect. This post may be completely outdated. Refer to Prefect docs and website to stay up-to-date.

Basic building blocks of any dataflow

In Prefect, Python is the API. We believe that your code is the best representation of your workflow. But to organize the steps, govern…

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

No responses yet