Serverless Real-Time Data Pipelines on AWS with Prefect, ECS and GitHub Actions

A guide to fully automated serverless real-time data pipelines

Anna Geller
15 min readJul 25, 2022


Most data platforms these days are still operated using batch processing. Even though streaming technology matured, building automated and reliable real-time data pipelines is still difficult and often requires a team of engineers to operate the underlying platform. But it doesn’t have to be that way. We wrote about it already last year.

In this post, we’ll get more hands-on. You’ll see how to turn any batch processing Python script into a real-time data pipeline orchestrated by Prefect. We’ll deploy the real-time streaming flow to a serverless containerized service running on AWS ECS Fargate — all resources will be deployed with Infrastructure as Code (leveraging CloudFormation), and the deployment process can be triggered with a single click from a GitHub Actions workflow.

With a CI/CD template, we’ll then ensure that future changes will be automatically redeployed with no manual intervention and no downtime.

Table of contents:1. Why Prefect 2.0 for real-time data pipelines?
Drawbacks of batch processing
Opinion: why you likely don’t need a distributed message queue
How can Prefect 2.0 handle such low-latency real-time workflows?
Benefits of moving towards real-time workflows with Prefect 2.0
Why can I not just run a single DAG 24/7?
2. Demo time!
Typical batch-processing flow
Turn it into a streaming service
What if something goes wrong?
3. Getting value from real-time: take automated action!
Using Prefect Blocks to store key-value pairs
Validate the data
Conclusion on a local demo
4. Deploy the real-time data pipeline as a serverless container
Configure repository secrets
Deploy the entire infrastructure in a single click
Observe the real-time data pipelines in your Prefect UI
5. Automate future deployments with CI/CD
Making changes to the code
(Optional) One flow run gets stuck in a Running state
What happens when there are infrastructure issues?
Limitations of the approach presented in



Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email YouTube:

Recommended from Medium