Detailed explanation of important concepts of Kafka and cluster key configuration

Important concepts

broker

A broker is a Kafka instance, responsible for receiving, forwarding, and storing messages. A Kafka cluster is composed of multiple brokers.

topic

Kafka's topic is a logical concept, that is, grouping and categorizing messages to facilitate the processing of different business logic messages. The index concept in topic and Elasticsearch is similar.

partition

The partition of kafka is a physical concept, which corresponds to a folder in the file system. The partition is for topic. It is mainly for considering the large amount of topic data. It can be processed in parallel by splitting the topic data into partitions. , Improve the amount of concurrency.

The partition in kafka is similar to the shard in Elasticsearch, both of which physically split the data of the unified logical classification.

segment

The partition of kafka corresponds to a folder. After a little thought, you will find that the message storage should be a file. Then what is the name of the file where Kafka stores the message?

The answer is segment, which is translated into segments in many places.

Segment is a further physical split of Kafka's topic. By configuring the segment size reasonably according to the actual machine situation, and cooperating with Kafka's own indexing mechanism, read and write operations can be performed faster.

offset

offset is the message offset, this offset is the number of messages, not bytes

replica

Replica is a copy, basically all distributed middleware has the concept of copy

The copy of Kafka is for partition, not for topic

In the Kafka cluster, the availability is improved by distributing different copies of the partition on different brokers. When one broker is down, there are other copies available.

The copy has two important attributes: LEO and HW

Log End Offset (LEO): the offset of the next message in the log
High Watermark (HW): the smallest LEO in all copies

Why is the smallest LEO among all copies called the high water mark (HW) instead?

Mainly because Kafka does not allow consumers to consume more than the smallest LEO message in all replicas, so it is called high water mark (HW)

This is mainly for data inconsistencies. For example, the LEO in the Leader is relatively large, and then it hangs, and other copies become the Leader.

producer

Message producer, a service that publishes messages to the Kafka cluster

consumer

Message consumer, a service that consumes messages from the Kafka cluster

Consumer group

Consumer group is a concept in high-level consumer API. Each consumer belongs to a consumer group, and each message can only be consumed by one Consumer of the consumer group, but can be consumed by multiple consumer groups.

By setting the Consumer group, a message can be consumed by different groups, which is very practical. For example, a login message may be required for both data statistics business and activity business. Then you only need to set up different Consumer groups to consume the same login. news.

leader 与 follower

The copy has 2 roles, one is leader and the other is follower.

There is only one leader in the same copy, and the other copies are followers.

The producer and consumer only interact with the leader, and then the leader and the follower interact.

For example, the producer sends a message to the leader, and the leader forwards the message to the follower, and will execute different responses according to the ack configuration of the producer, which will be described in detail later.

controller

The controller is for the broker. The brokers in the Kafka cluster will elect a leader to control the partition's leader election, failover and other operations. The leader elected by the broker is the controller.

The election of brokers depends on Zookeeper. Broker nodes go to Zookeeper to register a temporary node. Because only one broker will register successfully, the others will fail. The broker who successfully registered the temporary node on Zookeeper will become the controller, and the other brokers are called broker follower. .

The controller will monitor all the information of other brokers. If the controller goes down, the temporary node on zookeeper will disappear. At this time, all brokers will go to Zookeeper to register a temporary node together, because only one broker will register successfully , Everything else will fail, so the Broker that successfully registered the temporary node on Zookeeper will become the new Controller.

Once a broker goes down, the controller will read the status of all partitions on the down broker on zookeeper and select a replica in the ISR list as the partition leader.

If the replicas in the ISR list are all down, choose a surviving replica as the leader;
if all replicas of the partition are down, set the new leader to -1, wait for recovery, and wait for any replica in the ISR to recover , And choose it as the Leader; or choose the first Replica to survive, not necessarily the leader in the ISR.

When the broker is down, the controller will also notify zookeeper, and zookeeper will notify other brokers.

Broker's split-brain problem: After the
controller is successfully registered on Zookeeper, the default value of the timeout for communication with Zookeeper is 6s, that is, if the controller does not have a heartbeat with Zookeeper in 6s, then Zookeeper thinks that the controller is dead.

This temporary node will be deleted on Zookeeper, then other brokers will think that the controller is gone, and will rush to register the temporary node again, and the successfully registered broker will become the controller.

Then, the previous controller needs various shutdowns to shut down the monitoring of various nodes and events. But when the read and write traffic of Kafka is very huge, the message coming in from the producer at this time cannot be landed due to the presence of two controllers in the Kafka cluster, resulting in data accumulation.

coordinator

Group Coordinator is a service, and each Broker will start a service when it starts.

The function of the Group Coordinator is to store the related Meta information of the Group and record the Offset information of the corresponding Partition in the topic __consumer_offsets of Kafka.

Before 0.9, Kafka was based on Zookeeper to store the offset information of Partition (consumers/{group}/offsets/{topic}/{partition}), because ZK is not suitable for frequent write operations, so after 0.9, through the built-in Topic Way to record the Offset of the corresponding Partition.

Important configuration of Kafka

boker related

#在集群中的唯一标识broker,非负数
broker.id=1

#broker server服务端口
port=9091

#kafka数据的存放地址,多个地址的话用逗号分割D:\\data11,D:\\data12
log.dirs=D:\\kafkas\\datas\\data1

#ZK集群的地址,可以是多个,多个之间用逗号分割hostname1:port1,hostname2:port2
zookeeper.connect=localhost:2181

#ZK连接超时时间
zookeeper.connection.timeout.ms=6000

#ZK会话超时时间
zookeeper.session.timeout.ms=6000

#segment日志的索引文件大小限制,会被topic创建时的指定参数覆盖
log.index.size.max.bytes =10*1024*1024

#segment的大小,达到指定大小会新创建一个segment文件,会被topic创建时的指定参数覆盖
log.segment.bytes =1024*1024*1024

# broker接受的消息体的最大大小
message.max.bytes =	1000012

#broker处理消息的最大线程数
num.network.threads=3

#broker处理磁盘IO的线程数
num.io.threads=8

#socket的发送缓冲区
socket.send.buffer.bytes=102400
#socket的接受缓冲区
socket.receive.buffer.bytes=102400
#socket请求的最大数值,message.max.bytes必然要小于socket.request.max.bytes
socket.request.max.bytes=104857600

#topic默认分区个数,会被topic创建时的指定参数覆盖
num.partitions=1

#partition副本数量配置,默认1,表示没有副本,2表示除了leader还有一个follower
default.replication.factor =1

#是否允许自动创建topic,若是false,就需要手动创建topic
auto.create.topics.enable =true

producer (producer related)

# 0不管消息是否写入成功,1只需要leader写入消息成功,all需要leader和ISR中的follower都写入成功
acks = 1

#设置生产者内存缓冲区的大小,生产者用它缓冲要发送到服务器的消息。
#如果应用程序发送消息的速度超过发送到服务器的速度,会导致生产者空间不足。这个时候,send()方法调用要么被阻塞,要么抛出异常
buffer.memory = 10240

# 当buffer.memory不足,阻塞多久抛出异常
max.block.ms = 3000

# 默认消息发送时不会被压缩。可设置为snappy、gzip、lz4
compression.type = snappy

# 重试次数
retries = 0

# 重试时间间隔
retry.backoff.ms = 100

# 发向相同partition每个批次的大小,默认16384
batch.size = 10240

# batch.size要产生消息比发送消息快才会出现
# linger.ms可以控制让发送等n毫秒再发送,以达到批量发送的目的
linger.ms = 0

# 控制生产者每次发送的请求大小,默认1M
max.request.size = 	1048576

# 指定了生产者在收到服务器响应之前可以发送多少个消息
max.in.flight.requests.per.connection = 1

# tcp缓冲区大小
receive.buffer.bytes = 4096
send.buffer.bytes = 4096

The snappy compression algorithm occupies less CPU, has better performance and compression ratio. The
gzip compression algorithm occupies more CPU, but it will provide a higher compression ratio.

max.in.flight.requests.per.connection causes message sequence problems. If: retries>0 && max.in.flight.requests.per.connection >1:
then, if the first batch of messages fails to be written, and If the second batch is successfully written, it will retry to write the first batch. If the first batch is also written successfully at this time, then the order of the two batches is reversed.

max.in.flight.requests.per.connection=1, even if a retry occurs, it can be guaranteed that the message is written in the order in which it was sent.

consumer (consumer related)

# broker服务器列表
bootstrap.servers=localhost:9092,localhost:9093,localhost:9094

# 消费者每次poll数据时的最大数量
max.poll.records = 500

# 为true则自动提交偏移量
enable.auto.commit = true

# 自动提交偏移量周期(时间间隔)
auto.commit.interval.ms = 5000

# 如果该配置时间内consumer没有响应Coordinator的心跳检测,就认为consumer挂了,rebalance
session.timeout.ms = 10000

# Coordinator的心跳检测周期
heartbeat.interval.ms = 2000

# 当没有初始偏移量时,怎么办,默认latest
# earliest: 自动重置为最早的offset
# latest: 自动重置为最后的offset
# none: 如果在消费者组中没有前置的offset,抛异常
auto.offset.reset=latest

# 一次最小拉取多少字节,默认1字节
fetch.min.bytes=1

# 一次最多拉取多少字节数据,默认50M
fetch.max.bytes=52428800

# 一次拉取最多等待多少毫秒,默认500
fetch.max.wait.ms=500

replica (replica related)

#leader等待follower的最常时间,超过就將follower移除ISR(in-sync replicas)
replica.lag.time.max.ms =10000

#follower最大落后leader多少条消息,把此replicas迁移到其他follower中,在broker数量较少,或者网络不足的环境中,建议提高此值
replica.lag.max.messages =4000

#follower与leader之间的socket超时时间
replica.socket.timeout.ms=30*1000

#leader复制时候的socket缓存大小
replica.socket.receive.buffer.bytes=64*1024

#replicas每次获取数据的最大大小
replica.fetch.max.bytes =1024*1024

#replicas同leader之间通信的最大等待时间,失败了会重试
replica.fetch.wait.max.ms =500

#fetch的最小数据尺寸,如果leader中尚未同步的数据小于该值,将会阻塞,直到满足条件
replica.fetch.min.bytes =1

#leader进行复制的线程数,增大这个数值会增加follower的IO
num.replica.fetchers=1

log (log related)

#segment文件大小,会被topic创建时的指定参数覆盖
log.segment.bytes =1024*1024*1024

#segment滚动时间,没有达到log.segment.bytes也会强制新建一个segment,topic参数覆盖
log.roll.hours =24*7

#日志清理策略选择有:delete和compact主要针对过期数据的处理
log.cleanup.policy = delete

#数据存储的最大时间超过这个时间会根据log.cleanup.policy设置的策略处理数据
log.retention.minutes=6000

#topic每个分区大小,一个topic的大小限制=分区数*log.retention.bytes,-1没有大小
log.retention.bytes=-1
    
#文件大小检查的周期时间
log.retention.check.interval.ms=50000
    
#是否开启日志清理,默认true
log.cleaner.enable=true

#日志清理的线程数
log.cleaner.threads = 2

#日志清理时候处理的最大大小
log.cleaner.io.max.bytes.per.second=None

#日志清理去重时候的缓存空间,在空间允许的情况下,越大越好
log.cleaner.dedupe.buffer.size=500*1024*1024
    
#日志清理时候用到的IO块大小一般不需要修改
log.cleaner.io.buffer.size=512*1024

#值越大一次清理越多,hash冲突也越严重
log.cleaner.io.buffer.load.factor=0.9

#检查是否有需要清理的日志间隔
log.cleaner.backoff.ms =15000

#日志清理的频率控制,越大意味着更高效的清理,同时会存在一些空间上的浪费,topic参数覆盖
log.cleaner.min.cleanable.ratio=0.5

#对于压缩的日志保留的最长时间,会被topic创建时的指定参数覆盖
log.cleaner.delete.retention.ms =100000

#对于segment日志的索引文件大小限制,会被topic创建时的指定参数覆盖
log.index.size.max.bytes =10*1024*1024

#索引的offset间隔,设置越大,扫描速度越快,但是也更吃内存
log.index.interval.bytes =4096

#多少条消息,执行一次刷新到磁盘操作
log.flush.interval.messages=9223372036854775807

#多少毫秒之后刷新到磁盘一次,没有设置使用log.flush.scheduler.interval.ms
log.flush.interval.ms = null

#检查是否需要刷新到磁盘的时间间隔
log.flush.scheduler.interval.ms =3000

#文件在索引中清除后保留的时间一般不需要去修改
log.delete.delay.ms =60000

#控制上次落盘的时间点,以便于数据恢复
log.flush.offset.checkpoint.interval.ms =60000

The log.cleanup.policy parameter controls the log clearly, the default is to delete, you can set the log.cleanup.policy parameter to "delete,compact"

The compact here is not to compress, but to integrate the key of each message. For different value values ​​with the same key, only the last version is kept.

Pay attention to the difference between compact and compression. Compact is more like the marking and sorting of memory recovery. Compression means compression. Compression in kafka is for message content.

Kafka deletion can be based on 3:

  1. Based on time
  2. Based on size
  3. Based on offset

Log.retention.hours, log.retention.minutes, and log.retention.ms are configured, and the
time-based configuration has priority from high to low:

  1. log.retention.ms
  2. log.retention.minutes
  3. log.retention.hours

By default, only the log.retention.hours parameter is configured, and its value is 168. Therefore, the retention time of the log segment file is 7 days by default.

The size-based deletion is controlled by the log.retention.bytes parameter, the default is -1, and there is no size limit.

If any one of log.retention.bytes and log.retention.minutes meets the requirements, it will be deleted and will be overwritten by the specified parameters when the topic is created.

Kafka will merge the segments every time after cleaning the log. After the merge, the size will not exceed the log.segments.bytes configuration, and the default is 1GB.

controller

#是否允许关闭broker,若是设置为true,会关闭所有在broker上的leader,并转移到其他broker
controlled.shutdown.enable=false

#控制器关闭的尝试次数
controlled.shutdown.max.retries=3

#每次关闭尝试的时间间隔
controlled.shutdown.retry.backoff.ms=5000
    
#partition leader与replicas之间通讯时,socket的超时时间
controller.socket.timeout.ms =30000

#partition leader与replicas数据同步时,消息的队列尺寸
controller.message.queue.size=10

controlled.shutdown.enable=true is mainly for graceful shutdown:

  1. Can speed up restart
  2. Let the leader switch faster and reduce the unavailability of each partition to a few milliseconds

to sum up

Kafka summary

Guess you like

Origin blog.csdn.net/trayvontang/article/details/106388942