Member-only story

Serverless Real-Time Data Pipelines on AWS with Prefect, ECS and GitHub Actions

A guide to fully automated serverless real-time data pipelines

15 min readJul 25, 2022

Most data platforms these days are still operated using batch processing. Even though streaming technology matured, building automated and reliable real-time data pipelines is still difficult and often requires a team of engineers to operate the underlying platform. But it doesn’t have to be that way. We wrote about it already last year.

In this post, we’ll get more hands-on. You’ll see how to turn any batch processing Python script into a real-time data pipeline orchestrated by Prefect. We’ll deploy the real-time streaming flow to a serverless containerized service running on AWS ECS Fargate — all resources will be deployed with Infrastructure as Code (leveraging CloudFormation), and the deployment process can be triggered with a single click from a GitHub Actions workflow.

With a CI/CD template, we’ll then ensure that future changes will be automatically redeployed with no manual intervention and no downtime.

Table of contents:1. Why Prefect 2.0 for real-time data pipelines?
  ∘ Drawbacks of batch processing
  ∘ Opinion: why you likely don’t need a distributed message queue
  ∘ How can Prefect 2.0 handle…

Serverless Real-Time Data Pipelines on AWS with Prefect, ECS and GitHub Actions

A guide to fully automated serverless real-time data pipelines

Written by Anna Geller

Responses (1)