Member-only story
You No Longer Need Two Separate Systems for Batch Processing and Streaming
Can a more flexible workflow orchestration system handle both paradigms?
Many believe that streaming technologies are the only way to achieve real-time analytics. Since most data workloads these days are orchestrated through batch-processing platforms, the real-time requirement forces data teams to adopt a new set of tools. However, maintaining two separate systems for batch processing and real-time streaming introduces additional burdens and costs.
This post offers an alternative approach that allows you to handle both batch processing and real-time streaming pipelines from a single orchestration platform.
Understanding the problem
In the past, the data industry was trying to address this problem by introducing new architectures that separate the streaming layer from the batch and serving layer (think: Lambda and Kappa architecture). But are these batch and streaming paradigms inherently different? In practice, both collect and process data for use in downstream applications. The only real distinction between batch and stream processing is that they operate at slightly different time intervals. For a long time, the inflexibility of batch orchestrators and schedulers kept forcing…