Member-only story

Orchestrating ELT on Kubernetes with Prefect, dbt & Snowflake (Part 2)

A flow of flows: a guide on how to deploy large-scale data pipelines to production

Anna Geller
9 min readJan 4, 2022
Photo by Andrea Piacquadio from Pexels

This article is the second in a series of tutorials about orchestrating ELT data pipelines. The first post demonstrated how to organize and orchestrate a variety of flows written by different teams and how to trigger those in the correct order using Prefect. This post builds on that by capturing more advanced use cases and showcases how to deploy the entire project to a Kubernetes cluster on AWS.

Table of contents:
· Snowflake configuration
Creating database credentials
SQL alchemy connection
Using the connection to load raw data (Extract & Load)
Turning the extract & load script into a Prefect flow
· dbt configuration
· Deploying your flows to a remote Kubernetes cluster on AWS EKS
1. Building a custom Docker image
2. Pushing the image to ECR
3. Creating a demo Kubernetes cluster on AWS EKS
4. Deploying a Prefect’s Kubernetes agent
Changing the run configuration in your flows to KubernetesRun
Cleanup no longer needed AWS resources
· Building a repeatable CI/CD process
· Next steps

Snowflake configuration

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

Responses (1)