Kafka streams - joining two ktables invokes join function twice
I got following explanation after posting similar question to Confluent mail groups.
I think this might be related to caching. The caches for the 2 tables are flushed independently, so there is a chance you will get the same record twice. If stream1 and stream2 both receive a record for the same key, and the cache flushes, then:
The cache from stream1 will flush, perform the join, and produce a record.
The cache from stream2 will flush, perform the join, and produce a record.
Technically this is ok as the result of the join is another KTable, so the value in the KTable will be the correct value.
After setting following variable to 0 StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0 - issue was resolved. I still got 2 records - but now one record is joined with null - and its much clear behavior according to join semantics document that was provided above.