Friday, February 26, 2021

Kafka 101

When I do interviews with candidates, they usually talk about Kafka, so I ask them Kafka architecture, more often than not, the candidates cannot answer this properly or completely, so I summarize some key concepts of Kafka.

Kafka cluster typically consists of multiple brokers. Kafka broker uses Zookeeper to maintain the cluster state. Zookeeper also performs Kafka broker leader election.

Producers in Kafka push message to brokers. Consumers in Kafka consume message, by using partition offset the Kafka Consumer maintains that how many messages have been consumed because Kafka brokers are stateless. 

Kafka has four core APIs, producer API, Consumer API, Streams API, and Connector API.

Kafka topic is a logical channel to which producers publish message and from which the consumers receive messages. In a Kafka cluster, a topic is identified by its name and must be unique. There can be any number of topics, there is no limitation. 

Topics are split into Partitions and also replicated across brokers. There can be any number of Partitions, there is no limitation. In one partition, messages are stored in the sequenced fashion, and each message is assigned an incremental id, also called offset.

Topic replication takes place in the partition level only. For a given partition, only one broker can be a leader, other brokers will have in-sync replica.

If we can add a key to a message, we will get ensured that all these messages will end up in the same partition. With this, Kafka offers message sequencing guarantee. Without a key, message is written to partitions randomly.

Consumer Group can have multiple consumer process/instance running.

No comments:

Post a Comment