What does it mean to 'flush to disk'?

All data is immediately written to a persistent log on the filesystem without necessarily flushing to disk. In effect this just means that it is transferred into the kernel's pagecache.

What this means is that Kafka hands data off to the kernel with write() syscalls -- at which point in time it's visible to other processes but may or may not actually be reflected on disk and survive a reboot -- but doesn't force the kernel to rush it to disk with fsync() calls or similar (as appropriate for the OS at hand). If optimizing for throughput and not needing to guarantee that content is retrievable, this can be an appropriate decision: fsync() and its kin can be expensive calls (though by doing long contiguous writes that don't require seeking, kafka minimizes the expense of its disk IO).

Tags:

Memory

Caching