How to pass runtime-specific parameter values to your data pipelines

Photo by on

Parametrization is one of the most critical features of any modern workflow orchestration solution. It allows you to dynamically overwrite parameter values for a given run without having to redeploy your workflow. Most orchestration frameworks provide rather limited functionality in that regard, such as only allowing to override global variables…

How to use Prefect and dbt Cloud with a Snowflake data warehouse

Photo by from

This is the third post in a series of articles about orchestrating ELT data pipelines with Prefect. The dealt with organizing a project and orchestrating flows in a local environment. The discussed deploying the ELT project to Snowflake, AWS EKS, and building a CI/CD process. In…

A flow of flows: a guide on how to deploy large-scale data pipelines to production

Photo by from

This article is the second in a series of tutorials about orchestrating ELT data pipelines. The first post demonstrated how to organize and orchestrate a variety of flows written by different teams and how to trigger those in the correct order using Prefect. This post builds on that by capturing…

Build real-time analytics fast with Timestream and Grafana

Photo by on

The demonstrated how to ingest data into Timestream using boto3, awswrangler, and CLI.

In this post, we’ll dive deeper into using time-series functions in Timestream’s query language, and we’ll visualize the data using Grafana. The end goal is to have a Grafana dashboard populated with new data…

Can a more flexible workflow orchestration system handle both paradigms?

Many believe that streaming technologies are the only way to achieve real-time analytics. Since most data workloads these days are orchestrated through batch-processing platforms, the real-time requirement forces data teams to adopt a new set of tools. However, maintaining two separate systems for batch processing and real-time streaming introduces additional…

How to manage dependencies between data pipelines

Illustration of an ELT flow using Prefect and dbt

Workflow orchestration platforms have historically allowed managing task dependencies within individual data pipelines. While this is a good start, what if you have dependencies between data pipelines?

Say you have some flows or directed acyclic graphs (DAGs) that ingest operational data from various sources into the staging area of your…

How to ingest data into the AWS serverless time-series database

Night stars
Photo by from

Timestream is a serverless time-series database service offered by AWS. It can be used for operational analytics, IoT device monitoring, financial forecasting, and many more use cases that deal with time-series data. For more background on this service, check out my previous article:

In this post, we will dive deeper…

Serverless Data Engineering Pipelines in Python

Photo by from

P is a flexible tool to orchestrate the modern data stack. In contrast to many other solutions on the market, it doesn't tie you to any specific execution framework or cloud provider — whether you want to use , , a bare-metal server, or an on-demand distributed…

Including AWS best practices to avoid them

Photo by from

In this article, we’ll discuss potential pitfalls that we came across when configuring ECS task definitions. While considering this AWS-specific container management platform, we’ll also examine some general best practices for working with containers in production.

Table of contents:


Some activities we don’t spend enough time on

People in a meeting
Photo by from

Engineering time is a scarce resource. We often have to balance many tasks and often conflicting priorities. However, there are some activities for which allocating more of that time can be beneficial. In this article, we’ll look at ten of them.

1. Backups and Preventing Accidental Deletion

Have you ever deleted something prematurely only to figure…

Anna Geller

Data Engineer, . in BI, AWS Certified Solution Architect: . Get my articles via email:

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store