Using Kafka as a (CQRS) Eventstore. Good idea?

Kafka is meant to be a messaging system which has many similarities to an event store however to quote their intro:

The Kafka cluster retains all published messages—whether or not they have been consumed—for a configurable period of time. For example if the retention is set for two days, then for the two days after a message is published it is available for consumption, after which it will be discarded to free up space. Kafka's performance is effectively constant with respect to data size so retaining lots of data is not a problem.

So while messages can potentially be retained indefinitely, the expectation is that they will be deleted. This doesn't mean you can't use this as an event store, but it may be better to use something else. Take a look at EventStoreDB for an alternative.

UPDATE

Kafka documentation:

Event sourcing is a style of application design where state changes are logged as a time-ordered sequence of records. Kafka's support for very large stored log data makes it an excellent backend for an application built in this style.

UPDATE 2

One concern with using Kafka for event sourcing is the number of required topics. Typically in event sourcing, there is a stream (topic) of events per entity (such as user, product, etc). This way, the current state of an entity can be reconstituted by re-applying all events in the stream. Each Kafka topic consists of one or more partitions and each partition is stored as a directory on the file system. There will also be pressure from ZooKeeper as the number of znodes increases.


I am one of the original authors of Kafka. Kafka will work very well as a log for event sourcing. It is fault-tolerant, scales to enormous data sizes, and has a built in partitioning model.

We use it for several use cases of this form at LinkedIn. For example our open source stream processing system, Apache Samza, comes with built-in support for event sourcing.

I think you don't hear much about using Kafka for event sourcing primarily because the event sourcing terminology doesn't seem to be very prevalent in the consumer web space where Kafka is most popular.

I have written a bit about this style of Kafka usage here.