Terminology, partition strategy and message confirmation mechanism of kafka basic knowledge

basic knowledge of kafka

1. Basic terms

  • Message: Record, which is the main object processed by Kafka
  • Message displacement: Offset, corresponding to the location information of each message in the partition, is a monotonically increasing and constant value
  • Topic: Topic is a logical container for carrying messages; in actual use, it is mostly used to distinguish specific services, and different topics are different services
  • Producer: Producer, publish messages
  • Consumer: Consumer, subscribes to topic messages; multiple consumer instances together form a Consumer Group consumer group, and all consumer instances in the consumer group not only "divide" the data subscribed to the topic, but also assist each other. Assuming that an instance in the group hangs up, Kafka can automatically detect it, and then transfer the partition that the Failed instance was responsible for to other living consumers
  • Consumer displacement: Consumer Offset, used to represent the consumption progress of consumers, each consumer has its own consumer displacement

Second, understand Kafka's high-availability means from the structure

  • Both consumers and producers are clients of Kafka, and Kafka's server is composed of service processes called Brokers , that is, a Kafka cluster is composed of multiple Brokers.

  • Broker is responsible for receiving and processing requests sent by clients, and persisting messages. Although multiple Broker processes can run on the same machine, it is more common to run different Brokers on different machines, so that if a machine in the cluster goes down, even if all the Brokers running on it The processes are all hung up, and Brokers on other machines can still provide external services. This is one of the means by which Kafka provides high availability

  • The backup mechanism (Replication) is to copy the same data to multiple broker machines, and these same data copies are called replicas in Kafka. Replicas are classified into leader replicas (Leader Replica) and follower replicas (Follower Replica)

    • The former provides external services, and the producer always writes messages to the leader copy; while the consumer always reads the message from the leader copy;
    • The latter just passively follows the leader copy and cannot interact with the outside world. It only does one thing: send a request to the leader copy, requesting the leader to send it the latest production news, so that it can keep up with the leader synchronization of
  • The leader copy is the only external copy, so there may be a single broker that cannot accommodate it. Kafka has the concept of partitions , that is, each topic is divided into multiple partitions, and each partition is a set of ordered message logs; the copy is The level of the partition is defined, and each partition can be configured with several replica partitions. That is to say, from the perspective of the broker level, the producer writes messages to the partition, and the consumer reads the message from the partition.

    • Among them, in the Kafka source code, the number of replicas will be checked for parameters, which must be greater than 0 and less than or equal to the number of brokers, so as to ensure that the replicas are evenly distributed on different brokers; each different leader replica is distributed on different brokers as much as possible to realize each broker. load balancing

You can have a more vivid understanding of kafka as a whole through the following figure:

Note: The picture comes from the Internet
insert image description here

3. Partition strategy

分区策略其实就是决定生产者将消息发送到哪个分区的算法
If a partition is specified in the java API, it will be sent directly to the partition; if no partition is specified but a key is specified, the partition will be selected according to the hash value of the key; if neither the partition nor the key is specified, the polling strategy will be used

  • Polling strategy
    The default strategy is to send data to each partition in sequence. For example, if there are 3 partitions under a topic, the first message will be sent to partition 0, the second message will be sent to partition 1, and the third message will be sent to partition 0. is sent to partition 2, and so on
  • Random strategy
    Place the message on any partition, similar to random.nextInt() to obtain a value less than the total number of partitions, which is the partition to be sent
  • Custom strategy
    Just write a specific implementation class and configure the producer-side parameter partitioner.class, mainly need to implement the partition method

4. Message Confirmation Mechanism

The confirmation mechanism for the producer to send the message acksis realized by configuring the producer

  • acks = 0: The producer will not wait for any acknowledgments from the server at all. Messages are immediately added to the socket buffer and considered sent. In this case there is no guarantee that the server has received the message, the offset returned for each record will always be set to -1
  • acks = 1: The leader replica will write the message to its local log and respond without waiting for a full acknowledgment from all followers to sync to the message. If the leader fails immediately after acknowledging a message but before a follower replicates it, the message will be lost
  • acks=all / acks=-1 : the leader will wait 副本同步队列for all replicas to acknowledge receipt of the message. This guarantees that messages will not be lost as long as at least one in-sync replica remains active
  • The default setting isall

副本同步队列ISR(In-Sync Replicas): The leader of each partition is responsible for maintaining and tracking the status of all follower lags in the ISR. When the producer sends a message to the broker, the leader writes the message and replicates it to all followers in the ISR. Message replication latency is limited by the slowest follower, it is important to detect slow replicas quickly, if a follower "falls behind" too much or fails, the leader will remove it from the ISR. All replicas are collectively referred to as Assigned Replicas, and ISR is a subset of it. There is some delay time for the follower to synchronize data from the leader. Any one exceeding the threshold replica.lag.time.max.mswill remove the follower from the ISR.

Little thoughts:

Why does Kafka's copy mechanism not allow follower copies to also provide external read services like MySQL?

1. MySQL provides read services from the library, which realizes the read load and reduces the reading pressure of the main library; while the broker and partition allocation rules of Kafka have realized the load balancing of multiple brokers. 2. The essential difference between the data stored in Kafka and the database
data That is, kafka data has the concept of consumption, consumption requires displacement , and the concept of database entity data does not exist. If it is read from the follower of kafka, the consumer offset control will be more complicated.
3. If it is read from the follower, it is necessary to ensure that the leader collects After receiving the producer's message, the follower must also synchronize the data so as not to cause data inconsistency between copies. According to the message confirmation mechanism set by Kafka, it is necessary to wait for all follower copy data to be synchronized before the real message confirmation , it may take longer than the case of acks=all

Guess you like

Origin blog.csdn.net/weixin_47407737/article/details/128063496
Recommended