Basic concepts of Kafka and analysis of ISR

Basic concepts of Kafka

 

What are the characteristics of Kafka

Distributed: Support for message partitioning. 
Cross-platform. 
Real-time: Data supports real-time processing and strong message accumulation ability. 
Scalability: Supports horizontal expansion

 

Reasons for Kafka's high performance



1. Sequential writing: In the process of sequential writing to the disk, the consumer consumes data sequentially, but does not delete the data from the disk to avoid random writing to the disk. Alibaba Cloud supports rocket to delete a message. It may be done by tagging and dumping instead of physically deleting it.
2. Page Cache air relay:

  1. When the producer produces a message, it uses the pwrite() system call to write the data at an offset, and it will all be written to the page cache first.
  2. When consumers consume messages, they use the sendfile() system call to transfer data from the page cache to the Socket buffer of the broker with zero copy, and then transmit it over the network.
  3. At the same time, the data in the page cache will be written back to the disk with the flusher thread scheduling in the kernel and the call to sync()/fsync(). Even if the process crashes, there is no need to worry about data loss.
  4. If the message to be consumed by the consumer is not in the page cache, it will read it from the disk, and will pre-read some adjacent blocks into the page cache by the way to facilitate the next reading.
  5. From this we can draw an important conclusion: if the Kafka producer’s production rate is not much different from the consumer’s consumption rate, then the entire production-consumption process can be completed almost only by reading and writing to the broker page cache, with very few disk accesses. This conclusion is commonly known as "literacy air relay".

Kafka high-performance zero copy

Page Cache process: 
Page Cache (page cache) is a main disk caching mechanism/strategy implemented by the operating system to reduce disk IO operations. In simple terms, it caches the data in the disk into the memory and saves the data on the disk. Access becomes an access to memory, thereby improving performance.

ISR 和 OSR 原理图

 

  1. Instant messages will be pulled to the ISR collection, and lagging messages will be pulled to the OSR collection. When the OSR progress catches up with the Leader copy, the ISR will pull the lagging messages in the ORS.
  2. ISR 主从同步 HW(高水位线) LEO(最后的日志记录位置)

    示例:

    主节点 将所有数据同步到 的所有从节点之后,消息才能被消费,例如,有三条数据,前两条数据ab 同步完成 所有主从节点ABC, c数据因为网络原因或者其他原因只同步了AB 两个节点,C节点中还没有同步到c数据,此时HW 只能拉取到ab 两条消息,c消息因为没有同步到C节点,所以没办法被消费,直到c消息同步到C节点之后,HW 则处于正常的水平,此时所有abc,三条数据则都能被消费。



Important parameters of Kafka producer

acks:指定发送消息后,broker端至少有多少个副本接收到该消息,默认1
acks=0,不需要等待响应
acks=-1 acks=all  需要等待ISR中的所有副本都成功写入消息后才能够收到来自服务端的成功响应---min.insync.replicas=2

Kafka's necessary parameters for consumers

bootstrap.servers  用于指定连接kafka集群所需的broker地址清单

key.deserializer 和 value.deserializer 发序列和参数

group.id 消费者所属消费组

subscribe 消息主题订阅,支持集合和正则表达式

assgin 只订阅主题的某个分区

 

Kafka partitioner ( partitioner ) practical application scenario schematic diagram

 

Guess you like

Origin blog.csdn.net/weixin_38305866/article/details/109308827