Producer tuning, hardware selection:
Assumption: 1 million daily actives, 100 logs per person per day, the total number of logs per day is 1 million * 100 = 100 million
Processing speed=100 million/(24*3600s)=1150/s
A log is calculated as 1k, 1150/second * 1k ≈ 1m/s.
Peak period per second: 1150 bars * 20 times = 23000 bars. Data volume: 20MB/s
Number of servers to choose :
Number of Servers = 2 * (Producer Peak Production Rate * Replicas / 100) + 1 = 2 * (20m/s * 2 / 100) + 1 = 3
Disk selection : The bottom layer of Kafka is mainly sequential writing, and the sequential writing speed of solid-state drives and mechanical hard drives is similar
Total data volume per day: 100 million * 1k ≈ 100g
100g * copy 2 * storage time 3 days / 0.7 ≈ 1T
It is recommended that the total size of the three server hard disks is greater than or equal to 1T
Memory selection : heap memory + page cache
Kafka heap memory is recommended for each node: 10g ~ 15g, the default is 1g
How to modify heap memory
在 kafka-server-start.sh 中修改
if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
export KAFKA_HEAP_OPTS="-Xmx10G -Xms10G"
fi
Check the GC status of Kafka, no need to modify the number of times
#Kafka 进程号
jps
#查看 Kafka 的 GC 情况
jstat -gc 进程 ls 10
View memory usage
jmap -heap 进程号
Configure page caching
The page cache is the memory of the Linux system server. We only need to ensure that 25% of the data in a segment (1g) is in memory
Page cache size per node = (number of partitions * 1g * 25%) / number of nodes. For example, 10 partitions, page cache size = (10 * 1g * 25%) / 3 ≈ 1g
CPU selection
The three threads that occupy the most cpu in kafka, the total amount is recommended to account for 2/3 of the total cpu
num.io.threads = 8 The number of threads responsible for writing to the disk. The entire parameter value should account for 50% of the total number of cores.
num.replica.fetchers = 1 The number of replica pull threads, this parameter accounts for 1/3 of 50% of the total number of cores.
num.network.threads = 3 The number of data transmission threads, this parameter accounts for 2/3 of 50% of the total number of cores
Producer optimization
The producer sends data process:
How to increase throughput
parameter name | describe |
---|---|
buffer.memory | The total size of the RecordAccumulator buffer, the default is 32m |
batch.size | The maximum value of a batch of data in the buffer, the default is 16k. Appropriately increasing this value can improve throughput, but if the value is set too large, it will lead to increased data transmission delay |
linger.ms | If the data does not reach batch.size, the sender will send the data after waiting for linger.time. The unit is ms, the default value is 0ms, which means no delay. The production environment recommends that the value be between 5-100ms |
compression.type | The compression method for all data sent by the producer. The default is none, which means no compression. Supported compression types: none, gzip, snappy, lz4, and zstd. |
Data reliability: The ACK level is set to -1 + the partition replica is greater than or equal to 2 + the minimum number of replicas responded in the ISR is greater than or equal to 2
acks: 0: The data sent by the producer does not need to wait for the data to be placed on the disk to respond. 1: The data sent by the producer, the leader responds after receiving the data. -1 (all): The data sent by the producer, all nodes in the Leader+ and isr queues will reply after collecting the data. The default value is -1, -1 and all are equivalent.
data deduplication
Enable idempotency: ensure the reliability of single-partition single-session data
enable.idempotenc: Whether to enable idempotency, the default is true, indicating that idempotency is enabled
Complete deduplication: open transaction
transaction api
// 1 初始化事务
void initTransactions();
// 2 开启事务
void beginTransaction() throws ProducerFencedException;
// 3 在事务内提交已经消费的偏移量(主要用于消费者)
void sendOffsetsToTransaction(Map<TopicPartition, OffsetAndMetadata> offsets,
String consumerGroupId) throws
ProducerFencedException;
// 4 提交事务
void commitTransaction() throws ProducerFencedException;
// 5 放弃事务(类似于回滚事务的操作)
void abortTransaction() throws ProducerFencedException;
Data is ordered : put data directly into one partition
In a single partition, ordered (conditional, not out of order); multiple partitions, out of order between partitions;
Data out of order: enable idempotency setting ack
enable.idempotence | Whether to enable idempotency, the default is true, indicating that idempotency is enabled. |
---|---|
max.in.flight.requests.per.connection | The maximum number of times that no ack is returned is allowed. The default is 5. To enable idempotency, ensure that the value is a number from 1 to 5. |
Broker core parameter configuration
work process
parameter name | describe |
---|---|
replica.lag.time.max.ms | In an ISR, if a follower does not send a communication request or synchronization data to the leader for a long time, the follower will be kicked out of the ISR. The time threshold, the default is 30s. |
auto.leader.rebalance.enable | Default is true. Automatic Leader Partition balancing. Closed is recommended. |
leader.imbalance.per.broker.percentage | The default is 10%. The ratio of unbalanced leaders allowed per broker. If each broker exceeds this value, the controller will trigger the balance of the leader. |
leader.imbalance.check.interval.seconds | The default value is 300 seconds. The interval to check whether the leader load is balanced. |
log.segment.bytes | The log log in Kafka is stored in blocks. This configuration refers to the size of the log log divided into blocks. The default value is 1G. |
log.index.interval.bytes | The default is 4kb. Whenever a 4kb log (.log) is written in kafka, an index is recorded in the index file. |
log.retention.hours | The data storage time in Kafka, the default is 7 days. |
log.retention.minutes | The time of data storage in Kafka, in minutes, closed by default |
log.retention.ms | The time of data storage in Kafka, in milliseconds, closed by default |
log.retention.check.interval.ms | The interval to check whether the data is saved with timeout, the default is 5 minutes |
log.retention.bytes | Defaults to -1, which means infinity. If the total size of all logs exceeds the set size, delete the oldest segment |
log.cleanup.policy | The default is delete, which means that the deletion strategy is enabled for all data; if the setting value is compact, it means that the compression strategy is enabled for all data. |
num.io.threads | The default is 8. The number of threads responsible for writing to disk. The entire parameter value should account for 50% of the total number of cores. |
num.replica.fetchers | Default is 1. The number of replica pull threads, this parameter accounts for 1/3 of 50% of the total number of cores |
num.network.threads | The default is 3. The number of data transmission threads, this parameter accounts for 2/3 of 50% of the total number of cores |
log.flush.interval.messages | The number of lines to force the page cache to be flushed to disk, the default is the maximum value of long, 9223372036854775807. It is generally not recommended to modify it, and leave it to the system for its own management |
log.flush.interval.ms | How often to flush data to disk, the default is null. It is generally not recommended to modify, and leave it to the system for its own management |
Commission new node/retire old node
Create a theme to be balanced
vim topics-to-move.json
#写入
{
"topics": [
{"topic": "first"}
],
"version": 1
}
Generate a load balanced plan
bin/kafka-reassign-partitions.sh --
bootstrap-server hadoop102:9092 --topics-to-move-json-file
topics-to-move.json --broker-list "0,1,2,3" --generate
Create a replica storage plan (all replicas are stored in broker0, broker1, broker2, broker3)
vim increase-replication-factor.json
Execute the copy storage plan
bin/kafka-reassign-partitions.sh --
bootstrap-server hadoop102:9092 --reassignment-json-file increasereplication-factor.json --execute
Verify Replica Storage Plan
bin/kafka-reassign-partitions.sh --
bootstrap-server hadoop102:9092 --reassignment-json-file increasereplication-factor.json --verify
Increase partition : the number of partitions can only be increased, not decreased
bin/kafka-topics.sh --bootstrap-server
192.168.6.100:9092 --alter --topic first --partitions 3
Increase the copy factor
create topic
bin/kafka-topics.sh --bootstrap-server
hadoop102:9092 --create --partitions 3 --replication-factor 1 --
topic four
Manually increase the replica storage and create a replica storage plan (all replicas are specified to be stored in broker0, broker1, and broker2)
vim increase-replication-factor.json
{"version":1,"partitions":[{"topic":"four","partition":0,"replica
s":[0,1,2]},{"topic":"four","partition":1,"replicas":[0,1,2]},{"t
opic":"four","partition":2,"replicas":[0,1,2]}]}
Execute the copy storage plan
bin/kafka-reassign-partitions.sh --
bootstrap-server hadoop102:9092 --reassignment-json-file increasereplication-factor.json --execute
Manually adjust partition copy storage
Create a replica storage plan (all replicas are specified to be stored in broker0, broker1)
vim increase-replication-factor.json
{
"version":1,
"partitions":[{"topic":"three","partition":0,"replicas":[0,1]},
{"topic":"three","partition":1,"replicas":[0,1]},
{"topic":"three","partition":2,"replicas":[1,0]},
{"topic":"three","partition":3,"replicas":[1,0]}]
}
Execute the copy storage plan
bin/kafka-reassign-partitions.sh --
bootstrap-server 192.168.6.100:9092 --reassignment-json-file increasereplication-factor.json --execute
Verify Replica Storage Plan
bin/kafka-reassign-partitions.sh --
bootstrap-server 192.168.6.100:9092 --reassignment-json-file increasereplication-factor.json --verify
Leader Partition Load Balancing
auto.leader.rebalance.enable | Default is true. Automatic Leader Partition balancing. In a production environment, the cost of leader re-election is relatively high, which may affect performance. It is recommended to set it to false to disable it. |
---|---|
leader.imbalance.per.broker.percentage | The default is 10%. The ratio of unbalanced leaders allowed per broker. If each broker exceeds this value, the controller will trigger the balance of the leader |
leader.imbalance.check.interval.seconds | The default value is 300 seconds. The interval to check whether the leader load is balanced |
自动创建主题
如果 broker 端配置参数 auto.create.topics.enable 设置为 true(默认值是 true),那么当生 产者向一个未创建的主题发送消息时,会自动创建一个分区数为 num.partitions(默认值为 1)、副本因子为 default.replication.factor(默认值为 1)的主题。除此之外,当一个消费者 开始从未知主题中读取消息时,或者当任意一个客户端向未知主题发送元数据请求时,都会 自动创建一个相应主题。这种创建主题的方式是非预期的,增加了主题管理和维护的难度。 生产环境建议将该参数设置为 false
Kafka 消费者
消费者组初始化流程:
1、coordinator:辅助实现消费者组的初始化和分区的分配。
coordinator节点选择 = groupid的hashcode值 % 50( consumer_offsets的分区数量)
例如: groupid的hashcode值 = 1,1% 50 = 1,那么__consumer_offsets 主题的1号分区,在哪个broker上,就选择这个节点的coordinator 作为这个消费者组的老大。消费者组下的所有的消费者提交offset的时候就往这个分区去提交offset
消费者组详细消费流程
优化配置
enable.auto.commit | 默认值为 true,消费者会自动周期性地向服务器提交偏移量 |
---|---|
auto.commit.interval.ms | 如果设置了 enable.auto.commit 的值为 true, 则该值定义了 消费者偏移量向 Kafka 提交的频率,默认 5s |
auto.offset.reset | 当 Kafka 中没有初始偏移量或当前偏移量在服务器中不存在 (如,数据被删除了),该如何处理? earliest:自动重置偏 移量到最早的偏移量。 latest:默认,自动重置偏移量为最新 的偏移量。 none:如果消费组原来的(previous)偏移量不 存在,则向消费者抛异常。 anything:向消费者抛异常 |
offsets.topic.num.partitions | __consumer_offsets 的分区数,默认是 50 个分区。不建议修 改 |
heartbeat.interval.ms | Kafka 消费者和 coordinator 之间的心跳时间,默认 3s。 该条目的值必须小于 session.timeout.ms ,也不应该高于 session.timeout.ms 的 1/3。不建议修改 |
session.timeout.ms | Kafka 消费者和 coordinator 之间连接超时时间,默认 45s。超 过该值,该消费者被移除,消费者组执行再平衡 |
max.poll.interval.ms | 消费者处理消息的最大时长,默认是 5 分钟。超过该值,该 消费者被移除,消费者组执行再平衡 |
fetch.max.wait.ms | 默认 500ms。如果没有从服务器端获取到一批数据的最小字 节数。该时间到,仍然会返回数据 |
fetch.max.bytes | 默认 Default: 52428800(50 m)。消费者获取服务器端一批 消息最大的字节数。如果服务器端一批次的数据大于该值 (50m)仍然可以拉取回来这批数据,因此,这不是一个绝 对最大值。一批次的大小受 message.max.bytes (broker config)or max.message.bytes (topic config)影响 |
max.poll.records | 一次 poll 拉取数据返回消息的最大条数,默认是 500 条 |
key.deserializer 和 value.deserializer | 指定接收消息的 key 和 value 的反序列化类型 |
消费者再平衡
heartbeat.interval.ms | Kafka 消费者和 coordinator 之间的心跳时间,默认 3s。 该条目的值必须小于 session.timeout.ms,也不应该高于 session.timeout.ms 的 1/3 |
---|---|
session.timeout.ms | Kafka 消费者和 coordinator 之间连接超时时间,默认 45s。 超过该值,该消费者被移除,消费者组执行再平衡 |
max.poll.interval.ms | 消费者处理消息的最大时长,默认是 5 分钟。超过该值,该 消费者被移除,消费者组执行再平衡 |
partition.assignment.strategy | 消 费 者 分 区 分 配 策 略 , 默 认 策 略 是 Range + CooperativeSticky。Kafka 可以同时使用多个分区分配策略。 可以选择的策略包括:Range、RoundRobin、Sticky、 CooperativeSticky |