KStream batch process windows
My actual tasks is to push updates from the stream to redis but I don't want to read / update / write individiually even though redis is fast. My solution for now is to use KStream.process() supply a processor which adds to a queue on process and actually process the queue in punctuate.
public class BatchedProcessor extends AbstractProcessor{
...
BatchedProcessor(Writer writer, long schedulePeriodic)
@Override
public void init(ProcessorContext context) {
super.init(context);
context.schedule(schedulePeriodic);
}
@Override
public void punctuate(long timestamp) {
super.punctuate(timestamp);
writer.processQueue();
context().commit();
}
@Override
public void process(Long aLong, IntentUpdateEvent intentUpdateEvent) {
writer.addToQueue(intentUpdateEvent);
}
I still have to test but it solves the problem I had. One could easily write such a processor in a very generic way. The API is very neat and clean but a processBatched((List batchedMessaages)-> ..., timeInterval OR countInterval) that just uses punctuate to process the batch and commits at that point and collects KeyValues in a Store might be a useful addition.
But maybe it was intended to solve this with a Processor and keep the API purely in the one message at a time low latency focus.
Right now (as of Kafka 0.10.0.0 / 0.10.0.1): The windowing behavior you are describing is "working as expected". That is, if you are getting 1,000 incoming messages, you will (currently) always see 1,000 updates going downstream with the latest versions of Kafka / Kafka Streams.
Looking ahead: The Kafka community is working on new features to make this update-rate behavior more flexible (e.g. to allow for what you described above as your desired behavior). See KIP-63: Unify store and downstream caching in streams for more details.