Member-only story

You No Longer Need Two Separate Systems for Batch Processing and Streaming

Can a more flexible workflow orchestration system handle both paradigms?

Anna Geller
7 min readNov 30, 2021

Many believe that streaming technologies are the only way to achieve real-time analytics. Since most data workloads these days are orchestrated through batch-processing platforms, the real-time requirement forces data teams to adopt a new set of tools. However, maintaining two separate systems for batch processing and real-time streaming introduces additional burdens and costs.

This post offers an alternative approach that allows you to handle both batch processing and real-time streaming pipelines from a single orchestration platform.

Understanding the problem

In the past, the data industry was trying to address this problem by introducing new architectures that separate the streaming layer from the batch and serving layer (think: Lambda and Kappa architecture). But are these batch and streaming paradigms inherently different? In practice, both collect and process data for use in downstream applications. The only real distinction between batch and stream processing is that they operate at slightly different time intervals. For a long time, the inflexibility of batch orchestrators and schedulers kept forcing…

--

--

Anna Geller
Anna Geller

Written by Anna Geller

Data Engineering, AWS Cloud, Serverless & .py. Get my articles via email https://annageller.medium.com/subscribe YouTube: https://www.youtube.com/@anna__geller

Responses (1)