Mini Batch, Micro Batch vs Streaming and Co-Existence

by Caroline Maier

Hero Image
October 16, 2020 1:08pm


Check out our brief excerpt from our Live Discussion between Equalum CEO - Nir Livneh - and Eckerson President Wayne Eckerson. Here they address the question of Batch's vs Streaming a

Wayne: In recent years, we have had mini batch where you're loading data in 15 minutes, and even micro batch, loading data every 5 minutes or less than that. You are turning batch into something closer to streaming. I'm wondering if you are a batch shop, and that's what you know, why wouldn't you stick with batch or micro batch?

Nir: You're looking at batch as - as long as I do "x" latency I'll be fine, but the reality is not as simple. Batch takes data at rest and loads it from scratch, most of the time. That means if you have 1 billion records coming in, sitting there at rest, but you only have data changing 100 records every minute, then every 5 minutes that you want to do that micro batch, you will be loading that million records. This is not the best, most efficient way to achieve even a latency of 5 minutes - you will overload your system. Streaming is not just about latency, but about event driven, or change data driven architecture that essentially takes transactional data and converts it into event driven ingestion.

Audience Comment: I don't think there will be a lot of batch processing left in a few years. In manufacturing, the same data is needed for high latency for analytics and low latency prediction, so streaming covers all use cases.

Wayne: Yes, sounds like a streaming first architecture. You can do streaming or do batch if you need to. One platform to support both is the ideal, and I think you would agree with that too Nir?

Nir: Yes we live by that rule in our company. To some extent, when you are going to load historical data for the last 3 years, you are always going to do that in batch. It does not make sense as there is a very specific tradeoff with throughput as you move to streaming. And with event by event streaming, you will pay with throughput. Batch does have a place in the world, but I do agree that at the end of the day, much will start to move towards streaming and historical post mortem items will likely stay batch - at least for now.


Ready to Get Started?

Experience Enterprise-Grade Data Integration + Real-Time Streaming