Topics, partitions and keys
Partitions increase parallelism of Kafka topic. Any number of consumers/producers can use the same partition. Its up to application layer to define the protocol. Kafka guarantees delivery. Regarding the API, you may want to look at Java docs as they may be more complete. Based on my experience:
- Partitions start from 0
- Keys may be used to send messages to the same partition. For example hash(key)%num_partition. The logic is pluggable to Producer. https://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/producer/Partitioner.html
- Yes. but be careful not to end up with some key that will result in the "dedicated" partition. For this, you may want to have dedicated topic. For example, control topic and data topic
- This seems to be the same question as 3.
- I believe consumers should not make assumptions of the data based on partition. The typical approach is to have consumer group that can read from multiple partitions of a topic. If you want to have dedicated channels, it is better (safer/maintainable) to use separate topics.
Does it mean if i want to have more than 1 consumer (from the same group) reading from one topic I need to have more than 1 partition?
Let's see the following properties of kafka:
- each partition is consumed by exactly one consumer in the group
- one consumer in the group can consume more than one partition
- the number of consumer processes in a group must be <= number of partitions
With these properties, kafka is smartly able to provide both ordering guarantees
and load balancing
over a pool of consumer processes.
To answer your question, yes, in the context of the same group, if you want to have N consumers
, you have to have at least N partitions
.
Does it mean I need same amount of partitions as amount of consumers for the same group?
I think this has been explained in the first answer.
How many consumers can read from one partition?
The number of consumers
that can read from one partition is always equal to the number of consumer groups
subscribing to that topic.
Relationship between keys and partitions with regard to API
First, we must understand that the producer
is responsible for choosing which record to assign to which partition within the topic.
Now, lets see how producer does so. First, lets see the class definition of ProducerRecord.java
:
public class ProducerRecord<K, V> {
private final String topic;
private final Integer partition;
private final Headers headers;
private final K key;
private final V value;
private final Long timestamp;
}
Here, the field that we have to understand from the class is partition
.
From the ProducerRecord docs,
- If a valid
partition number
is specified, thatpartition
will be used when sending the record. - If no partition is specified but a
key
is present a partition will be chosen using ahash of the key
. - If neither
key
norpartition
is present a partition will be assigned in around-robin fashion
.