How Kafka works

One, message routing strategy

When publishing a message via API, the producer publishes the message with Record. Record contains key and value, value is our real message itself, and key is used for the partition where the message is routed. The partition to which the message is written is not random, but has a routing strategy.

1) If partition is specified, write directly to the specified partition;

2) If the partition is not specified but the key is specified, then the hash value of the key and the number of partitions are modulo the modulo

The result is the partition index to be selected;

3) If neither partition nor key is specified, a polling algorithm is used to select a partition.

Two, message writing algorithm

 

The message producer sends the message to the broker and forms the final log that can be consumed by the consumer, which is a more complicated process.

1) The producer submits a connection request to the broker cluster, and any broker connected to it will send a broker to it

The communication URL of the controller, that is, the listeners address in the broker controller host configuration file

2) When the producer specifies the topic to produce the message, it will send a request to the broker controller to request the current

Leader list addresses of all partitions in topic

3) After the broker controller receives the request, it will find the leader of all partitions of the specified topic from zk,

And return to the producer

4) After the producer receives the leader list address, it finds the current message to be sent according to the message routing strategy

The partition leader, and then send the message to the leader

5) The leader writes the message to the local log and notifies the followers in the ISR

6) The followers in the ISR send an ACK to the leader after synchronizing the message from the leader

7) After the leader receives the ACKs from all the followers in the ISR, it adds HW, indicating that the consumer can already consume the position

HW truncation mechanism

If the partition leader receives a new message, other followers in the ISR are in the process of synchronizing, and the leader hangs before the synchronization is complete. At this time, a new leader needs to be elected. Without the HW truncation mechanism, the leader and follower data in the partition will be inconsistent.

2.2.4 Reliability mechanism of message sending

When the producer sends a message to Kafka, it can choose the required reliability level. Set by the value of the acks parameter.

( 1 ) 0 value

Send asynchronously. The producer sends a message to Kafka without the need for Kafka to report a successful ack. This method has the highest efficiency but the lowest reliability. There may be situations where messages are lost.

( 2 ) 1 value

Send synchronously, the default value. The producer sends a message to Kafka, and the partition leader of the broker sends a successful ack immediately after receiving the message (no need to wait for the follower in the ISR to complete synchronization). After receiving the message, the producer knows that the message is sent successfully, and then sends the message again. If Kafka's ack has not been received, the producer will think that the message has failed and will resend the message.

( 3 ) -1 value

Send synchronously. Its value is equivalent to all. The producer sends a message to Kafka. After receiving the message, Kafka will not send a successful ack to the producer until all the copies in the ISR list have synchronized the message. If Kafka's ack has not been received, it is considered that the message has failed to be sent and the message will be automatically resent.

2.2.5 Analysis of consumer consumption process

The producer sends the message to the topic, and the consumer can consume it. The consumption process is as follows:

1) The consumer submits a connection request to the broker cluster, and any broker connected to it will send a broker to it

The communication URL of the controller, that is, the listeners address in the broker controller host configuration file

2) When the consumer specifies the topic to be consumed, it will send a poll request to the broker controller

3) The broker controller will assign one or more partition leaders to the consumer, and take the partition leader as

The previous offset is sent to the consumer

4) The consumer will consume the messages in it according to the partition assigned by the broker controller

5) When the consumer finishes consuming the message, the consumer will send a feedback to the broker that the message has been consumed, namely

Offset of the message

6) When the broker receives the consumer's offset, it will update to the corresponding __consumer_offset

7) The above process is repeated until the consumer stops requesting the message

8) Consumers can reset the offset so that they can flexibly consume the messages stored on the broker

2.2.6 Partition Leader election scope

When the leader dies, the broker controller will select a follower from the ISR to become the new leader. But what if all the copies in the ISR are down? The scope of Leader election can be set by the value of unclean.leader.election.enable.

1false

Must wait for a copy of the ISR list to come to life before a new election. The reliability of this strategy is guaranteed, but the availability is low.

2true

When there is no copy in the ISR, you can choose any partition copy of the topic in the host without downtime as the new leader. This strategy has high availability, but reliability is not guaranteed.

2.2.7 Repeated consumption problems and solutions

There are two most common repeated consumption:

( 1 ) with a consumer repeat consumption

When Consumers consume overtime due to low consumption power, repeated consumption may occur.

( 2 ) a different consumer repeated consumption

When the Consumer consumes messages but fails to submit the offset and it goes down, the messages that have been consumed will be consumed repeatedly.

2.3 Kafka cluster construction

In order to prevent single points of problems in the production environment, Kafka appears in clusters. Next, we will build a Kafka cluster, which contains three Kafka hosts, namely three Brokers. To be next chapter

2.3.1 Download of Kafka

 

Guess you like

Origin blog.csdn.net/superiorpengFight/article/details/109270723