Kafka architecture diagram
Kafka core controller
definition
In the Kafka cluster, a broker will be elected as the controller, responsible for managing the status of all partitions and replicas in the cluster;
Duty
- Monitor broker changes by monitoring the /brokers/ids/ node in Zookeeper
- Monitor topic changes by monitoring the /brokers/topics node in Zookeeper to monitor topic changes in real time
- Manage topic, partition, broker related information
- Update the metadata information of the data and synchronize it to other broker nodes
Election process
The principle of broker controller election is realized by the temporary node of zookeeper: When the
kafka cluster starts, each broker will try to be the controller, and will register itself with the controller node of zookeeper, but due to the characteristics of zookeerper, if the node has been created , Re-creation will fail, so only one broker will be created successfully, then the successfully created broker will become the controller; in addition, other brokers will listen to this controller node
Since the controller is a temporary node, when the controller broker hangs up, it will disconnect from the zookeeper session, and the temporary node will also disappear. After other nodes monitor the disappearance of the controller node, they will regain the controller node.
Partition copy selection mechanism
When the controller senses that the broker where the replica leader is located is down, it will select the broker where the first replica is located in the replica list where the current replica is located as the replica leader, and must ensure that this broker must be in the ISR of the replica (surviving copy) In the broker) collection, if the first one does not exist, continue to try the second and the third until it is satisfied; I don’t know why the author of kafka doesn’t pick it directly in the ISR collection, it’s one more step
How to record the offset consumed by consumers?
The offset of the partition where each consumer consumes will be recorded in Kafka's internal topic (__consumer_offsets). Kafka will create 50 partitions for this topic by default to resist high concurrency; when
submitted to this topic, the key is the current consumer The consumer group ID+topic+partition number where the value is the current offset value, then which partition will Kafka send this message to is determined by the following formula:
hash(consumer group id)% __consumer_offsets topic partition number ( The default is 50), the consumer will get the offset value from here before every consumption;
What is consumer rebalance?
definition
When a consumer in a consumer group hangs up or quits, the partition assigned to it will be automatically assigned to other consumers (consumers who are not assigned a consumer partition). Not only that, when there is a new consumer After joining, the rebalance operation will also be triggered.
Rebalance process
-
Select the group coordinator (GroupCoordinato)
because the partition to which the consumption offset of each consumer group is submitted is determined, that is, calculated by the formula hash(consumer group id)% __consumer_offsets topic partition number, so kafka directly submits this The replica leader where the partition is located serves as the group coordinator. -
Join the consumer group JOIN GROUP
After selecting the group coordinator, the next step is to join the consumer group stage. At this stage, the consumer will send a JoinGroupRequest request to the GroupCoordinator and process the response, and then the GroupCoordinator will choose the first one to join from a consumer group The consumer of the group acts as the leader (consumer group coordinator) and sends the consumer group information to the leader, and then the leader will be responsible for formulating the partition plan. -
The SYNC GROUP
consumer leader sends a SyncGroupRequest to the GroupCoordinator, and then the GroupCoordinator delivers the partition plan to each consumer, and they will perform network connection and message consumption according to the leader broker of the designated partition.
Rebalance partition allocation strategy
Set the allocation strategy by configuring the parameter partition.assignment.strategy on the consumer client, the default is range
-
range
if there are 10 partitions, 4 consumer, then the first step of calculating an average number of partitions each consumer assigned: 10/4 = 2, so that each consumer into two partitions, also the remaining two partitions , Then the remaining two partitions are allocated to the first two consumers, and the final distribution result: the
first consumer: 0, 1, 2 the
second consumer: 3, 4, 5 the
third consumer: 6, 7 The
fourth consumer: 8, 9 -
Round-robin (polling allocation)
is easy to understand, as above, there are 10 partitions, 4 consumers:
first consumer: 0 , 4, 8 second consumer: 1 , 5
, 9
third consumer : 2, 6
Fourth consumer: 3, 7 -
sticky
- Partition as evenly as possible
- The allocation of the partition is as same as the last allocation
If the current partition allocation is as follows:
The first
consumer: 0, 4, 8 The second consumer: 1, 5, 9 The
third consumer: 2, 6 The
fourth consumer: 3, 7Now if the fourth consumer hangs up, the redistribution is as follows: the
first consumer: 0 , 4, 8, 7 the
second consumer: 1, 5, 9 the
third consumer: 2, 6, 3If the two rules conflict, the first principle is given priority
Producer publishing message process
- Write mode The
producer uses the push mode to publish messages to the broker, and each message is appended to the partition, which is a sequential write to disk - The broker publishes messages to the specified partition according to the following rules
- If you specify a partition, it will be sent directly to the specified partition
- If no partition is specified, the number of partitions is obtained by hashing according to the key to obtain the partition
- If neither key nor partition is specified, the rotation method is used
- Sending process
Important parameter acks explanation:
- ack=0
means that the producer sends the message out, without waiting for any response, the message is considered to be sent successfully - ack=1
means that after the producer sends the message to the leader, the leader writes the message to the disk and responds with the producer ACK, then the producer considers the message to be sent successfully - ack=all/-1
means that after the producer sends the message to the leader, after the leader writes to the disk, it also waits for ACKs from other replicas. Only when all the writes are successful, the leader responds to the producer with an ACK, and then the producer considers the message to be sent successfully
- The producer finds the brokenerid of the replica leader from the state node of zookeeper
- The producer sends the message to the leader
- The leader writes the message to the log
- The follower pulls the message from the leader, writes it to the local log, and replies ack to the leader
- After the leader receives all the copies in the ISR and resumes the ACK, it increases the HW and sends an ACK to the producer
What are HW and LEO?
HW is commonly known as high water level, and also known as the maximum offset that can be consumed by consumption. LEO is the maximum offset that can be seen inside the broker. So how does this maximum offset occur?
Under normal circumstances, LEO's offset and HW's offset are the same.
After a new message is sent to the leader, the leader's LEO will increase. At this time, LEO's offset is different from HW's offset, and then the copy starts to pull Leader message, the corresponding replica LEO will also increase. After the last replica synchronization completes the message, the offset of LEO and HW will be the same. What is the purpose of this? It is for data consistency to ensure that the message can only be visible to consumers after synchronization is completed.
Wechat search for a search [Le Zai open talk] Follow the handsome me, reply [Receive dry goods], there will be a lot of interview materials and architect must-read books waiting for you to choose, including java basics, java concurrency, microservices, middleware, etc. More information is waiting for you.