Kafka learning and combing

This article briefly summarizes the principles and concepts of kafka

1. Kafka is a message queue,

The corresponding characteristics are

  • High throughput: throughput of up to hundreds of thousands
  • High concurrency: support thousands of clients to read and write at the same time
  • Low latency: the minimum latency is only a few milliseconds
  • Message persistence and reliability: messages are persisted to the local disk, and data backup is supported at the same time
  • Cluster fault tolerance: allow some nodes to fail
  • Scalability: support cluster dynamic expansion

2. The main terms in Kafka

  1. broker
  2. topic
  3. partition
  4. consumer
  5. producer
  6. Cluster
  7. news

  Kafka is a distributed message middleware. A message is like any piece of data we store in MYSQL. The kafka service exists in a cluster, and each node in the cluster is called a broker. The way messages are archived in Kafka is archived according to topics (just like a table in a database). At the same time, each topic can be divided into one or more partitions (which can be understood as sub-tables in MYSQL). Used to increase the degree of parallelism (the degree of parallelism of the producer or the consumer).
  The producer of the message is called the producer, and the consumer of the message is called the consumer. The producer sends the message to a partition of a topic in the broker, and the consumer specifies to pull data from the partition of one or more topics. Kafka can guarantee the orderly production and consumption of data (of course, if you want to strictly guarantee it, you need to further understand and set it). A simple diagram of Kafka:
Insert picture description here

3. kafka broker

What to do at the kafka broker level can be divided into two relatively large aspects, one is the management of the cluster, and the other is the management of the data.

3.1 Data redundancy and leader partition of topic partition

  Because Kafka is distributed, it will use certain data redundancy to combat the failure of some nodes in the cluster. By setting the replica.factor=3 (or other) of each topic, each topic will have 3 In this way, when one or even the broker where the copy is located is down, the service can still be provided (depending on the configuration, it may cause a little data loss, which will be discussed later). Because each partition has multiple copies, and Kafka stipulates that there can be only one leader copy among multiple copies, and all reads and writes are performed on this leader.
  In the entire Kafka cluster, there is a master role that most distributed services have. He is responsible for the leader election of the topic partition of the entire cluster, called the controller.

3.2 Controller in the cluster

  Like all distributed clusters have master nodes (zookeeper has master nodes and es also has), kafka also has a master role, but it is called controller in kafka. The controller can be considered to be responsible for the availability management of the cluster, and he is mainly responsible for the leader election of each partition.
  So how are the controllers in kafka elected? Because of the zookeeper, the election function has been greatly simplified. At startup, everyone goes to zookeeper to preemptively create the same temporary node (/controller). The distributed consistency of zookeeper ensures that only one request can be created successfully, then the successful creation becomes the controller, and the other failed nodes add monitoring on this node. When the current controller fails, this temporary node (/controller) ) Will be deleted by zookeeper, and then everyone can compete again.
  The new controller obtained by the controller in each election will confirm that it is a new controller through zookeeper, and then put epoch+=1, what is the epoch here, it is a value, which can be simply understood as the emperor’s year number. , Each emperor has its own epoch, and the next emperor’s epoch is larger than the current one. The controller will also bring this epoch every time it sends a message. This is mainly to prevent split-brain, that is, after the new controller is selected, the old stuck controller comes back to life. At this time, he may still think he is The controller then issues orders to other nodes, but his epoch is relatively old, such information will be ignored by other nodes.
  Epoch is a commonly used method to avoid split brain in distributed, and it has applications in zookeeper and raft.

3.3 partition中的leader,follower,in-sync-replica

Leader, follower, and in-sync-replica are all runtime concepts, and they will change dynamically. In static storage, it is generally called replica.
  As mentioned above, in order to improve fault tolerance, Kafka has multiple replicas for each partition. (Replica) One of the multiple replicas will become the leader, and each request from the producer and consumer is always processed by the leader partition.
  When the producer sends a message to the leader, the leader will store the message in its own partition, and the follower is also constantly pulling data from the leader, that is, synchronizing the data in the leader. In it kafka allow some copies more slowly, so it can improve service performance services, kafka maintain a subset of a follower, called in-sync-replicacalled synchronous replica set, this collection which also includes the leader itself, which is the leader would think that these copies are been Keeping synchronization with the leader.

1. How does the leader determine whether a follower should belong to the in-sync-replica set?

  1. In earlier versions, it can be determined by setting replica.lag.max.messages=5(或其他)that the message lags behind the leader with 5 messages, the follower will be kicked out of the in-sync-replica collection
  2. Starting from 0.9, this parameter is discarded and used instead replica.lag.time.max.ms=10000(默认是10s). This configuration indicates that if the follower in the in-sync-replica does not send a request to pull data to the leader within 10s, then the followe will be kicked out of the in-sync -replica.
  3. The advantage of such an optimization is that if there is an instantaneous message flow peak, the data of the follower in the in-sync-replica is likely to lag behind the leader more, and then will be kicked out, and will join back after keeping up with the leader. kicked out of the case - the addition of itself is not necessary, if replica.lag.time.max.msyou can avoid this problem

2. Leader election

  When a leader then hang up, the new leader is how the controller to select it, he would from in-sync-replicaselecting a seat leader, is a bit rough, but we can also see that, because in-sync-replicathe follower is likely to lag behind For the leader, after the new leader is selected, its data may lag behind the original leader, which may cause data loss (a success message has been returned to the producer, but the final message has not been completed. Persistence, the consumer cannot consume this message). Of course, we can achieve data loss through some configurations, which requires the cooperation of the broker end and the producer end.

3. Minimum ISR setting

  It can also be configured on the broker side min.insync.replicas. If min.insync.replicas=2, then there must be at least two synchronized replicas to write data to the partition. If there is only one synchronized copy at this time, the Broker will stop accepting requests from the producer. At this time, the Broker becomes read-only. The producer trying to send data will receive a NotEnoughReplicasException exception, but the consumer can still continue to read the existing The data.
  This is to avoid unexpected behaviors in data writing and reading during incomplete elections. It can be seen that this parameter is also an important part of achieving high availability. If set min.insync.replicas=1, then the leader hangs and the leader cannot be selected. Up.

3.4 High-Watermark of messages in partition

1. What is HighWatermark and what is its use

  The messages in Kafka are continuously stored in the partition in the order of first-come-first-served, High-Watermark, high watermark, which identifies which message is up to which the consumer can see. To meet data consistency as much as possible, it is possible that the message is only the leader in the partition and has not been replicated to other followers. At this time, it is not safe for the consumer to see this message. Because this message may not have promised that the producer has been successfully persisted, if the leader goes down at this time, it will also cause data inconsistency, because this time may be returned to the producer is a fail, but in fact the consumer has consumed this data . Therefore, Kafka designed a high-water-mark to identify which piece of data can be consumed by the consumer.

2. HW's update mechanism:

1. LEO ,log-end-offset

  Before talking about the update mechanism of HW, you need to understand LEO (log-end-offset). This is the offset of the last log of the log stored in each replica of the partition (the last stored in). The offset is actually the entry of the log. The sequence number can be understood as the subscript of the array.
  Each time the producer sends a message, the corresponding server (broker) will be placed in a specified partition, and each message will generate an offset.
  At the same time, the leader will not only have its own LEO, but also other follower's LEO information (this information is also the fetch offset passed in when the follower pulls the leader's data), and the leader will complete the HW based on these LEO information Update. HW=min(LEO) We can answer the HW update through the following questions

  1. When will follower update LEO

    1. The follower replica's dedicated thread continuously sends FETCH requests to the broker where the leader replica is located, and carries its own fetch-offset data (that is, its own LEO).
    2. The leader copy sends a FETCH response to the follower copy.
    3. After the follower gets the response, it takes out the data and writes it to the local bottom log, during which the LEO value will be updated.
  2. How does the leader update the follower's LEO recorded by its own
    leader? The LEO value of the non-self replica object on the leader side is updated during the process of processing the follower's FETCH request by the leader.

  3. When will follower update HW

    1. The follower replica object updates the HW after it updates the local LEO.
    2. Once the follower finishes writing data to the local log, it will try to update its HW value.
    3. The algorithm is to take the smaller value of the leader's HW value in the local LEO and FETCH response, which means that the leader will pass its own HW when the follower fetches.
  4. When does the leader update HW

    1. The leader replica object processing the follower FETCH request is to try to update its own HW value after updating the LEO of the leader's non-self replica object
    2. The message written by the producer will update the LEO of the leader replica
    3. When the copy is kicked out of the ISR
    4. After a partition is changed to a leader copy
  5. The leader's update mechanism during normal synchronization HW update process

    1. The leader will update its HW based on the LEO of all followers

4 producer

  The main purpose here is to understand the producer's sending mechanism and some of the more important configurations.

1. How does the producer directly connect to the node where the corresponding topic is located

Producers send messages directly to the leader partition on the broker without a series of routing and forwarding through any intermediary. In order to achieve this feature,

  1. Each broker in the Kafka cluster can respond to the producer's request and return some meta-information about the topic. This meta-information includes which machines are alive, where are the leader partitions of the topic, and which leader partitions can be accessed directly at this stage of.
  2. The Producer client itself controls which partitions the message is pushed to. The way of implementation can be random allocation, implementing a type of random load balancing algorithm, or specifying some partitioning algorithm.
    1. Kafka provides an interface for users to implement custom partitions. Users can specify a partitionKey for each message, and use this key to implement some hash partitioning algorithms. For example, if userid is used as partitionkey, messages with the same userid will be pushed to the same partition.
    2. Pushing data in batches can greatly improve processing efficiency. Kafka Producer can send requests as a batch after accumulating a certain number of messages in memory. The number of batches can be controlled by the parameters of Producer. The parameter value can be set to the number of accumulated messages (such as 500), the accumulated time interval (such as 100ms) or the accumulated data size (64KB). By increasing the batch size, the number of network requests and disk IO can be reduced. Of course, specific parameter settings need to be weighed in terms of efficiency and timeliness.

2. Related configuration

acks:

  • acks=0, when the producer sends the message to the broker, it is considered successful and does not wait for the processing result of the broker. In this case, the configuration of the following retries is also invalid. This method has the highest throughput, but it is also the easiest to lose messages
  • acks=1: The producer will consider the message to be sent successfully after writing the message to the leader of the partition and returning success. If the group leader fails to write a message, the producer will receive an error response and try again. This method can avoid message loss to a certain extent, but if the message is not replicated to other copies when the group leader is down, the message will still be lost.
  • acks=all: The producer will wait for all copies to successfully write the message. This method is the safest and can ensure that the message is not lost, but the delay is also the largest. This is generally used for data persistence and consistency A relatively high scenario, but the requirements for data throughput are not particularly high.

retries

  • When the producer sends a message and receives a recoverable exception, it will retry. This parameter specifies the number of retries. In actual situations, this parameter needs to be used in conjunction with retry.backoff.ms (retry waiting interval). It is recommended that the total retry time is longer than the time for the cluster to re-elect the group leader, so as to avoid the producer from ending the retry too early. failure

batch.size

  • When multiple messages are sent to a partition, the producer will send them in batches. This parameter specifies the upper limit of the batch message size (in bytes). When the batch message reaches this size, the producer will send it to the broker together; but even if it does not reach this size, the producer will have a timing mechanism to send the message to avoid excessive message delay.

linger.ms

  • This parameter specifies the time that the producer waits before sending the batch message. When this parameter is set, even if the specified size of the batch message is not reached, the producer will send the batch message to the broker after the arrival time. By default, the producer's message sending thread will send messages as long as it is idle, even if there is only one message. After setting this parameter, the sending thread will wait for a certain amount of time, which can increase the throughput by sending messages in batches, but at the same time it will increase the delay.

buffer.memory

  • This parameter sets the memory size of the message sent by the producer to buffer

client.id

  • This parameter can be any string, which is used by the broker to identify which client the message comes from. It is used when the broker is printing logs, metrics, or quota limits.

max.in.flight.requests.per.connection

  • This parameter specifies how many messages the producer can send to the broker and wait for a response. Setting a higher value for this parameter can increase throughput, but it will also increase memory consumption. If you want to ensure the order of the message, you can only set it to 1. The default is 5 in 2.0. Kafka guarantees the strict exactly-once of a single producer, and also guarantees the order, which is relatively good.

max.request.size

  • This parameter limits the size of the data packet sent by the producer. The size of the data packet is related to the size of the message and the number of messages. If we specify a maximum packet size of 1M, then the maximum message size is 1M, or a maximum of 1000 messages with a message size of 1K can be sent in batches. In addition, the broker also has a message.max.bytes parameter to control the size of the received data packet. In practice, it is recommended that these parameter values ​​match to prevent the producer from sending the data size beyond the limit of the broker.

5 consumer

  Kafka consumers, consumers, as the name suggests, pull the data generated by the producer from the Kafka broker. The data granularity of consumption can reach topic.partition, and at the same time, it can also specify the offset value of the starting position, or find the offset according to time, and then consume it.
  Each consumer will have a group-id, multiple consumers can belong to a group-id, and then share multiple partitions of a topic (improve parallelism).
  Consumers can use automatic submission of consumption point offsets, or use manual submission. In fact, he does not have an ack mechanism. Every time an offset is submitted, it is to produce a message to the topic __consumer_offset__ of the broker. Then the next time it starts, it will retrieve the relevant offset information from this topic by default, and use this offset to pull data from the broker.

1. Consumer's rebalance process

1. Timing of rebalance

1. Normal conditions

组成员个数发生变化。例如有新的 consumer 实例加入该消费组或者离开组。
订阅的 Topic 个数发生变化。
订阅 Topic 的分区数发生变化。

2. Consumer surprises

session 过期
max.poll.interval 到期,在这个时间值达到时,心跳线程会自动停止发送heartbeats 然后 发送leave-group request

At this time, rebalance will be triggered,

2. The rebalance process

Let’s add a new consumer to illustrate

  1. The Consumer Client sends a join-group request. If the Group does not exist, the Group is created, and the status of the Group is Empty;

  2. Since the member of the Group is empty, the member is added to the Group, and the current member (client) is set as the leader of the Group, and the rebalance operation is performed. The state of the Group becomes preparingRebalance, waiting for rebalance.timeout.ms after (in order to wait for other The member resends the join-group. If the status of the Group changes to preparingRebalance, the result of the needRejoin() method will return true when the Consumer Client performs poll operations, which means that the current Consumer Client needs to rejoin the Group), and the member of the Group is updated Has been completed, at this time the status of the Group changes to AwaitingSync, and returns a join-group response to all members of the group;

  3. After the client receives the join-group result, if it finds that its role is the leader of the group, it performs an assignment. The leader sends the result of the assignment to the GroupCoordinator through a sync-group request, and the follower will also send a sync-group to the GroupCoordinator Request (except that the corresponding field is empty);

  4. When the GroupCoordinator receives the request from the Group leader, it obtains the result of the assignment, and sends the assignment corresponding to each member to each member, and if the Client is a follower, no processing is done. At this time, the state of the group becomes Stable (also That is to say, only after receiving the leader's request, the result of sync-group will be returned to all members. This is only sent once and triggered by the leader's request).

2. Some configurations of consumers

fetch.min.bytes

  • This parameter allows consumers to specify the minimum amount of data when reading messages from the broker. When a consumer reads a message from the broker, if the amount of data is less than this threshold, the broker will wait until there is enough data before returning it to the consumer. For topics with low write volume, this parameter can reduce the pressure on brokers and consumers because it reduces the round-trip time. For topics with a large number of consumers, the broker pressure can be significantly reduced.

fetch.max.wait.ms

  • The fetch.min.bytes parameter above specifies the minimum amount of data that the consumer can read, and this parameter specifies the longest waiting time for the consumer to read, thereby avoiding long-term blocking. This parameter defaults to 500ms.

max.partition.fetch.bytes

  • This parameter specifies the maximum number of bytes returned by each partition, and the default is 1M. In other words, when KafkaConsumer.poll() returns the record list, the number of record bytes per partition is at most 1M. If a topic has 20 partitions and 5 consumers at the same time, then each consumer needs 4M of space to process messages. In reality, we need to set up more space so that when consumers are down, other consumers can take on more partitions.

  • It should be noted that max.partition.fetch.bytes must be larger than the largest message (set by max.message.size) that the broker can receive, otherwise consumers will not be able to consume the message. In addition, as can be seen in the above example, we usually call the poll method to read the message in a loop. If max.partition.fetch.bytes is set too large, it will take longer for consumers to process it, which may result in not timely poll and the session expired. In this case, either reduce max.partition.fetch.bytes or lengthen the session time.

session.timeout.ms

  • This parameter sets the expiration time of the consumer session, the default is 3 seconds. In other words, if the consumer does not send a heartbeat within this period of time, the broker will consider the session expired and rebalance the partition. This parameter is related to heartbeat.interval.ms, heartbeat.interval.ms controls how often the poll() method of KafkaConsumer sends a heartbeat. This value needs to be smaller than session.timeout.ms, generally 1/3, which is 1 second . The smaller session.timeout.ms allows Kafka to quickly find failures and rebalance, but it also increases the probability of misjudgment (for example, consumers may just process messages slowly instead of downtime).

auto.offset.reset

  • This parameter specifies the behavior when the consumer reads the partition for the first time or the last location is too old (for example, the consumer has been offline for too long). The value can be latest (consumption from the latest news) or earliest ( Start consumption from the oldest message).

enable.auto.commit

  • This parameter specifies whether the consumer automatically submits the consumption displacement, the default is true. If you need to reduce repeated consumption or data loss, you can set it to false. If true, you may need to pay attention to the time interval of automatic submission, which is set by auto.commit.interval.ms.

partition.assignment.strategy

  • We already know that when there are multiple consumers in the consumer group, the topic partition needs to be allocated to the consumers according to a certain strategy. This strategy is determined by the PartitionAssignor class, and there are two strategies by default:

    1. Range: For each topic, each consumer is responsible for a certain continuous range partition. If consumer C1 and consumer C2 subscribe to two topics, and these two topics have 3 partitions, then using this strategy will cause consumer C1 to be responsible for partition 0 and partition 1 of each topic (subscripts start at 0) , Consumer C2 is responsible for partition 2. It can be seen that if the number of consumers is not evenly divided by the number of partitions, then the first consumer will have a few more partitions (determined by the number of topics).
    2. Round Robin: All subscribed topic partitions are assigned to consumers one by one in order. Using the above example, consumer C1 is responsible for partition 0, partition 2 of the first topic, and partition 1 of the second topic; consumer C2 is responsible for the other partitions. It can be seen that this strategy is more balanced, and the difference in the number of partitions between all consumers is at most 1.
  • partition.assignment.strategyThe allocation strategy is set, the default is org.apache.kafka.clients.consumer.RangeAssignor(use scope strategy), you can set it to org.apache.kafka.clients.consumer.RoundRobinAssignor(use polling strategy), or implement an allocation strategy yourself and then partition.assignment.strategypoint to the implementation class.

client.id

  • This parameter can be any value and is used to specify the client from which the message is sent. It is generally used when printing logs, measuring indicators, and allocating quotas.

max.poll.records

  • This parameter controls the number of records returned by a poll() call. This can be used to control the amount of data processed by the application in the pull loop.

max.poll.interval.ms

  • The maximum time interval between two polls. Set a larger time to process the message. If the poll() operation is not performed at this time, it will automatically stop sending heartbeats and send a leave-group request.
    If two poll( ) If the processing request is relatively large, it should be done asynchronously, because the server will also use this parameter as the maximum time to wait for the corresponding rejion of other consumers. If other consumers also set max.poll.interval.ms to be relatively long, Then the entire rebalance may take a long time.

receive.buffer.bytes、send.buffer.bytes

  • These two parameters control the TCP buffer when reading and writing data, set to -1 to use the system default value. If the consumer and the broker are in different data centers, the buffer can be increased to a certain extent, because the general delay between data centers is relatively large.

partition.assignment.strategy

  • This sets the distribution mechanism of the consumer's partition in the consumer

In theory, the compression and decompression of kafka will not be processed by the broker unless it is configured separately, which may cause the CPU to increase
http://zhongmingmao.me/2019/08/02/kafka-compression/

6. The secret of high-performance reading and writing

  1. Sequential read and write
  2. Zero copy

Reference
https://juejin.im/post/5d9944e9f265da5b6a169271
https://juejin.im/post/5bf6b0acf265da612d18e931#heading-5
https://juejin.im/post/5c0683b1f265da614f701441
https://juejin.im/post/5c46e729e51d9c8e6e51d9c8e6

Guess you like

Origin blog.csdn.net/u013200380/article/details/114135666