Sitemap

Member-only story

Serverless Real-Time Data Pipelines on AWS with Prefect, ECS and GitHub Actions

A guide to fully automated serverless real-time data pipelines

15 min readJul 25, 2022

Most data platforms these days are still operated using batch processing. Even though streaming technology matured, building automated and reliable real-time data pipelines is still difficult and often requires a team of engineers to operate the underlying platform. But it doesn’t have to be that way. We wrote about it already last year.

In this post, we’ll get more hands-on. You’ll see how to turn any batch processing Python script into a real-time data pipeline orchestrated by Prefect. We’ll deploy the real-time streaming flow to a serverless containerized service running on AWS ECS Fargate — all resources will be deployed with Infrastructure as Code (leveraging CloudFormation), and the deployment process can be triggered with a single click from a GitHub Actions workflow.

With a CI/CD template, we’ll then ensure that future changes will be automatically redeployed with no manual intervention and no downtime.

Table of contents:1. Why Prefect 2.0 for real-time data pipelines?
Drawbacks of batch processing
Opinion: why you likely don’t need a distributed message queue
How can Prefect 2.0 handle

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

Responses (1)