Scheduled vs. Event-driven Data Pipelines — Orchestrate Anything with Prefect
When and when not to schedule your workflows and why sensors and daemon processes are a waste of resources
One of the most common data engineering challenges is triggering workflows in response to events such as when a new file arrives in a certain directory. The approach taken by legacy orchestrators is to deploy continuously running background processes, such as sensors or daemons that poll for status. In this post, we’ll discuss why this approach is a waste of resources and demonstrate an alternative — event-driven workflows observable with Prefect.
Use cases for scheduled dataflows
One prevalent use case for scheduled workflows is batch data ingestion and transformation. This process typically involves extracting data (possibly a delta compared to the last run) from source systems, ingesting it into your data warehouse, and transforming it when needed. Once this is finished, the same workflow may run data quality tests, refresh data extracts for reporting, or trigger downstream automated actions such as alerting about wrong KPIs and detecting anomalies in data.
Apart from these common data engineering tasks, many data science use cases…