Member-only story
How to Build Modular Dataflows with Tasks, Flows and Subflows in Prefect
And how to define state and data dependencies in your Prefect data pipelines
Prefect is a coordination plane for the constantly evolving world of data. Using Prefect, you can orchestrate any dataflow and observe everything else happening around it. By observing what’s going on in your data stack at any given moment, Prefect gives you visibility into the current state of your dataflow and allows you to dynamically react to it using orchestration.
Prefect has a rich vocabulary for defining your orchestration logic, including tasks, flows, and subflows, as well as state and data dependencies. Your dataflow works consistently regardless of whether you run it in a local process, async loop, or on a remote Dask or Ray cluster. This post will dive deeper into the core building blocks and explain how they can help you build modular and extensible dataflow.
⚠️ Note: I no longer work at Prefect. This post may be completely outdated. Refer to Prefect docs and website to stay up-to-date.
Basic building blocks of any dataflow
In Prefect, Python is the API. We believe that your code is the best representation of your workflow. But to organize the steps, govern…