Distributed topic|Recently, I have been struggling with the design principles of kafka, and I have been vomiting

Kafka architecture diagram

Insert picture description here

Kafka core controller

definition

In the Kafka cluster, a broker will be elected as the controller, responsible for managing the status of all partitions and replicas in the cluster;

Duty

  1. Monitor broker changes by monitoring the /brokers/ids/ node in Zookeeper
  2. Monitor topic changes by monitoring the /brokers/topics node in Zookeeper to monitor topic changes in real time
  3. Manage topic, partition, broker related information
  4. Update the metadata information of the data and synchronize it to other broker nodes

Election process

The principle of broker controller election is realized by the temporary node of zookeeper: When the
kafka cluster starts, each broker will try to be the controller, and will register itself with the controller node of zookeeper, but due to the characteristics of zookeerper, if the node has been created , Re-creation will fail, so only one broker will be created successfully, then the successfully created broker will become the controller; in addition, other brokers will listen to this controller node

Insert picture description here

Since the controller is a temporary node, when the controller broker hangs up, it will disconnect from the zookeeper session, and the temporary node will also disappear. After other nodes monitor the disappearance of the controller node, they will regain the controller node.
Insert picture description here

Partition copy selection mechanism

When the controller senses that the broker where the replica leader is located is down, it will select the broker where the first replica is located in the replica list where the current replica is located as the replica leader, and must ensure that this broker must be in the ISR of the replica (surviving copy) In the broker) collection, if the first one does not exist, continue to try the second and the third until it is satisfied; I don’t know why the author of kafka doesn’t pick it directly in the ISR collection, it’s one more step

How to record the offset consumed by consumers?

The offset of the partition where each consumer consumes will be recorded in Kafka's internal topic (__consumer_offsets). Kafka will create 50 partitions for this topic by default to resist high concurrency; when
submitted to this topic, the key is the current consumer The consumer group ID+topic+partition number where the value is the current offset value, then which partition will Kafka send this message to is determined by the following formula:
hash(consumer group id)% __consumer_offsets topic partition number ( The default is 50), the consumer will get the offset value from here before every consumption;

What is consumer rebalance?

definition

When a consumer in a consumer group hangs up or quits, the partition assigned to it will be automatically assigned to other consumers (consumers who are not assigned a consumer partition). Not only that, when there is a new consumer After joining, the rebalance operation will also be triggered.

Rebalance process

  1. Select the group coordinator (GroupCoordinato)
    because the partition to which the consumption offset of each consumer group is submitted is determined, that is, calculated by the formula hash(consumer group id)% __consumer_offsets topic partition number, so kafka directly submits this The replica leader where the partition is located serves as the group coordinator.

  2. Join the consumer group JOIN GROUP
    After selecting the group coordinator, the next step is to join the consumer group stage. At this stage, the consumer will send a JoinGroupRequest request to the GroupCoordinator and process the response, and then the GroupCoordinator will choose the first one to join from a consumer group The consumer of the group acts as the leader (consumer group coordinator) and sends the consumer group information to the leader, and then the leader will be responsible for formulating the partition plan.

  3. The SYNC GROUP
    consumer leader sends a SyncGroupRequest to the GroupCoordinator, and then the GroupCoordinator delivers the partition plan to each consumer, and they will perform network connection and message consumption according to the leader broker of the designated partition.

Rebalance partition allocation strategy

Set the allocation strategy by configuring the parameter partition.assignment.strategy on the consumer client, the default is range

  • range
    if there are 10 partitions, 4 consumer, then the first step of calculating an average number of partitions each consumer assigned: 10/4 = 2, so that each consumer into two partitions, also the remaining two partitions , Then the remaining two partitions are allocated to the first two consumers, and the final distribution result: the
    first consumer: 0, 1, 2 the
    second consumer: 3, 4, 5 the
    third consumer: 6, 7 The
    fourth consumer: 8, 9

  • Round-robin (polling allocation)
    is easy to understand, as above, there are 10 partitions, 4 consumers:
    first consumer: 0 , 4, 8 second consumer: 1 , 5
    , 9
    third consumer : 2, 6
    Fourth consumer: 3, 7

  • sticky

    • Partition as evenly as possible
    • The allocation of the partition is as same as the last allocation

    If the current partition allocation is as follows:

    The first
    consumer: 0, 4, 8 The second consumer: 1, 5, 9 The
    third consumer: 2, 6 The
    fourth consumer: 3, 7

    Now if the fourth consumer hangs up, the redistribution is as follows: the
    first consumer: 0 , 4, 8, 7 the
    second consumer: 1, 5, 9 the
    third consumer: 2, 6, 3

    If the two rules conflict, the first principle is given priority

Producer publishing message process

  1. Write mode The
    producer uses the push mode to publish messages to the broker, and each message is appended to the partition, which is a sequential write to disk
  2. The broker publishes messages to the specified partition according to the following rules
  • If you specify a partition, it will be sent directly to the specified partition
  • If no partition is specified, the number of partitions is obtained by hashing according to the key to obtain the partition
  • If neither key nor partition is specified, the rotation method is used
  1. Sending process
    Insert picture description here

Important parameter acks explanation:

  • ack=0
    means that the producer sends the message out, without waiting for any response, the message is considered to be sent successfully
  • ack=1
    means that after the producer sends the message to the leader, the leader writes the message to the disk and responds with the producer ACK, then the producer considers the message to be sent successfully
  • ack=all/-1
    means that after the producer sends the message to the leader, after the leader writes to the disk, it also waits for ACKs from other replicas. Only when all the writes are successful, the leader responds to the producer with an ACK, and then the producer considers the message to be sent successfully
  1. The producer finds the brokenerid of the replica leader from the state node of zookeeper
  2. The producer sends the message to the leader
  3. The leader writes the message to the log
  4. The follower pulls the message from the leader, writes it to the local log, and replies ack to the leader
  5. After the leader receives all the copies in the ISR and resumes the ACK, it increases the HW and sends an ACK to the producer

What are HW and LEO?

HW is commonly known as high water level, and also known as the maximum offset that can be consumed by consumption. LEO is the maximum offset that can be seen inside the broker. So how does this maximum offset occur?
Under normal circumstances, LEO's offset and HW's offset are the same.
After a new message is sent to the leader, the leader's LEO will increase. At this time, LEO's offset is different from HW's offset, and then the copy starts to pull Leader message, the corresponding replica LEO will also increase. After the last replica synchronization completes the message, the offset of LEO and HW will be the same. What is the purpose of this? It is for data consistency to ensure that the message can only be visible to consumers after synchronization is completed.
Insert picture description here

Wechat search for a search [Le Zai open talk] Follow the handsome me, reply [Receive dry goods], there will be a lot of interview materials and architect must-read books waiting for you to choose, including java basics, java concurrency, microservices, middleware, etc. More information is waiting for you.

Guess you like

Origin blog.csdn.net/weixin_34311210/article/details/110676348