Kafka's data consumption process and message non-loss mechanism

Kafka data writing process

Implementation process:

  1. The producer obtains the leader position of the corresponding partition
  2. Send data to leader
  3. The leader on the broker process writes the message to the local log
  4. Other followers pull messages from the leader, write them to the local log, and send ACK to the leader
  5. After the leader receives ACKs from all Replicas in the ISR, it returns ACKs to the producer to indicate success.
    insert image description here

Kafka data consumption process

Among all consumption queues, there are two types of data consumption processes: push mode (push), pull mode (pull)

  • Push mode (push): The message queue records the status of all consumption. If a message is marked as consumed, the consumer can no longer consume it.
  • Pull mode (pull): the consumer records the consumption status by himself, and each consumer pulls data in sequence independently of each other
  • The pull model adopted by kafka, the consumption status is recorded by the consumer itself, and each consumer sequentially pulls the messages of each partition independently of each other.
  • Consumers can consume messages in any order.
    For example, consumers can reset to the old offset and reprocess messages that have been consumed before; or jump directly to the nearest location and start consuming from the current moment.

Implementation process:

  1. The consumer obtains the partition and the offset corresponding to the consumer from ZK (by default, the offset of the last consumption is obtained from ZK)
  2. Find the leader of the partition and pull the data
  3. The leader reads data from the local log (log) and finally returns it to the consumer
  4. Finally, after pulling the data, submit the offset to ZK

insert image description here

No message loss mechanism

Broker data is not lost

After the producer writes data through the leader of the partition, all followers in the ISR will copy the data from the leader. In this way, it can be ensured that even if the leader crashes, the data of other followers is still available.

Producer data is not lost

  • When the producer connects to the leader to write data, it can use the ACK mechanism to ensure that the data has been successfully written. There are three optional configurations for the ACK mechanism.
    • When the ACK response requirement is configured as -1/all - it means that all nodes (leader and follower) have received the data.
    • When configuring the ACK response requirement to be 1 - indicates that the leader has received data (default configuration)
    • 3. When the configuration ACK impact requirement is 0 - the producer is only responsible for sending data and does not care whether the data is lost (this situation may cause data loss, but the performance is the best)
  • Producers can send data synchronously and asynchronously
    • Synchronization: After sending a batch of data to Kafka, wait for Kafka to return the result before executing the next statement.
    • Asynchronous: Send a batch of data to Kafka, just provide a callback function.

Note: If the broker does not give ack for a long time and the buffer is full, the developer can set whether to directly clear the data in the buffer

Consumer data is not lost

When consumers consume data, as long as each consumer records the ofiset value, the data will not be lost.

Guess you like

Origin blog.csdn.net/weixin_45970271/article/details/126556976