Scheduled vs. Event-driven Data Pipelines — Orchestrate Anything with Prefect

When and when not to schedule your workflows and why sensors and daemon processes are a waste of resources

Anna Geller
4 min readSep 6, 2022
Marvin observes what’s currently happening in the galaxy

One of the most common data engineering challenges is triggering workflows in response to events such as when a new file arrives in a certain directory. The approach taken by legacy orchestrators is to deploy continuously running background processes, such as sensors or daemons that poll for status. In this post, we’ll discuss why this approach is a waste of resources and demonstrate an alternative — event-driven workflows observable with Prefect.

Use cases for scheduled dataflows

One prevalent use case for scheduled workflows is batch data ingestion and transformation. This process typically involves extracting data (possibly a delta compared to the last run) from source systems, ingesting it into your data warehouse, and transforming it when needed. Once this is finished, the same workflow may run data quality tests, refresh data extracts for reporting, or trigger downstream automated actions such as alerting about wrong KPIs and detecting anomalies in data.

Apart from these common data engineering tasks, many data science use cases…

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

No responses yet