Open in app

Sign In

Write

Sign In

Anna Geller
Anna Geller

5.8K Followers

Home

About

Published in

Dev Genius

·Pinned

Introduction to Kestra, a Declarative Orchestration Alternative to Apache Airflow

Event-driven open-source data orchestrator with a built-in code editor and a rich plugin ecosystem — This post shares the story behind Kestra and explains how you can use it to orchestrate data processing workflows in a declarative way. What is Kestra Kestra is an open-source, event-driven data orchestrator that strives to make data workflows accessible to a broader audience. The product offers a declarative YAML interface for workflow…

Data Engineering

6 min read

Introduction to Kestra, a Declarative Orchestration Alternative to Apache Airflow
Introduction to Kestra, a Declarative Orchestration Alternative to Apache Airflow
Data Engineering

6 min read


Published in

Level Up Coding

·Aug 11

Polars, DuckDB, Pandas, Modin, Ponder, Fugue, Daft — which one is the best dataframe and SQL tool?

Comparison of open-source dataframe and SQL frameworks for data engineering, machine learning, and analytics — Tabular format with rows and columns, popularized by relational databases and Microsoft Excel, is an intuitive way of organizing and manipulating data for analytics. There are two main ways of transforming and analyzing tabular data — SQL and dataframes (sorry, Excel!). SQL is a declarative language for querying datasets of…

Data

14 min read

Polars, DuckDB, Pandas, Modin, Ponder, Fugue, Daft — which one is the best dataframe and SQL tool?
Polars, DuckDB, Pandas, Modin, Ponder, Fugue, Daft — which one is the best dataframe and SQL tool?
Data

14 min read


Published in

AWS in Plain English

·Aug 4

Apache Iceberg Crash Course for AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg

How to turn AWS data lake into a data lakehouse using Iceberg, the open table format — This crash course will guide you on how to get started with Apache Iceberg on AWS. By the end of this tutorial, you’ll be able to create Iceberg tables, insert and modify data stored in S3 in Parquet format, query data and table metadata in plain SQL, and declaratively manage…

Data Engineering

15 min read

Apache Iceberg Crash Course for AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg
Apache Iceberg Crash Course for AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg
Data Engineering

15 min read


Published in

Level Up Coding

·Jul 30

DuckDB vs. MotherDuck — should you switch to the Cloud version?

Why and when to use MotherDuck over local DuckDB — MotherDuck has recently launched the managed DuckDB product, which is currently in Beta. This post shares how to get started with both DuckDB and MotherDuck, what are key differences between them, and when to choose each of these options. Before diving into details, let’s clarify what DuckDB and MotherDuck are. DuckDB …

Data Engineering

8 min read

DuckDB vs. MotherDuck — should you switch to the Cloud version?
DuckDB vs. MotherDuck — should you switch to the Cloud version?
Data Engineering

8 min read


Published in

Level Up Coding

·Jul 5

Programmable Data Infrastructure is Finally Within Reach

Everything as Code for Data Infrastructure — Everything as Code (EaC) is a development approach aiming to express not only software but also its infrastructure and configuration in code. Changes to resources are managed programmatically using a Git workflow and a code review process rather than deployed manually. …

Data

6 min read

Programmable Data Infrastructure is Finally Within Reach
Programmable Data Infrastructure is Finally Within Reach
Data

6 min read


Published in

Level Up Coding

·Jun 28

End-to-End Data Ingestion, Transformation and Orchestration with Airbyte, dbt and Kestra

How you can use open-source tools to ingest, transform and orchestrate data pipelines without vendor lock-in — The key benefit of the Modern Data Stack is that you can avoid vendor lock-in by selecting best-of-breed tools rather than paying expensive license fees for one inflexible solution. However, assembling your modular stack based on multiple SaaS solutions will only marginally improve that situation. It’s undoubtedly easier to swap…

Data

5 min read

End-to-End Data Ingestion, Transformation and Orchestration with Airbyte, dbt and Kestra
End-to-End Data Ingestion, Transformation and Orchestration with Airbyte, dbt and Kestra
Data

5 min read


Published in

Dev Genius

·Jun 21

From Data Pipeline Purgatory to Production Heaven

Get rid of manual workflow deployments using Infrastructure as Code — You know the story. You write your data pipeline locally. You configure dependencies, and you validate that everything works as expected. If it does, it’s time to submit a pull request and ship it. Right? Well, that’s easier said than done. Moving to production without affecting existing applications and data…

Data Engineering

4 min read

From Data Pipeline Purgatory to Production Heaven
From Data Pipeline Purgatory to Production Heaven
Data Engineering

4 min read


Jun 20

Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 2

Berlin Buzzwords is Europe’s leading conference about search technologies, modern data infrastructure, and ML. This article continues upon the post from Day 1 and highlights key takeaways from the second (and also last) conference day. Summary of selected talks At any time, there were several parallel talks. …

Data

7 min read

Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 2
Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 2
Data

7 min read


Jun 19

Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 1

Berlin Buzzwords is Europe’s leading conference for modern data infrastructure, search technologies, and ML. The participants are engineers, IT architects, data scientists, and analysts interested in data searchability, AI, and data processing at scale. Top experts present contemporary industry trends and demystify the technology buzzwords. Focusing on open-source software projects…

Data

10 min read

Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 1
Berlin Buzzwords 2023 — Highlights and Key Takeaways from Day 1
Data

10 min read


Published in

Level Up Coding

·Jun 11

When GitHub Actions Get Painful to Troubleshoot, Try This Instead

CI/CD with a GitHub Webhook instead of GitHub Actions — Have you written a GitHub Actions workflow only to discover syntax issues at runtime? Have you ever wished you could validate the syntax and run the CI/CD workflow first locally on your machine? Don’t get me wrong — I love GitHub Actions. They are serverless and have a wide range…

Engineering

5 min read

When GitHub Actions Get Painful to Troubleshoot, Try This Instead
When GitHub Actions Get Painful to Troubleshoot, Try This Instead
Engineering

5 min read

Anna Geller

Anna Geller

5.8K Followers

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

Following
  • Sinem Günel

    Sinem Günel

  • Desiree Peralta

    Desiree Peralta

  • Allen Helton

    Allen Helton

  • Ryan Holiday

    Ryan Holiday

  • Kestra

    Kestra

See all (57)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams