Detailed explanation of spring kafka configuration

Spring kafka configuration details


Use spring-integration-kafka to send messages
1. Outbound Channel Adapter is used to send messages to Kafka.
2. Messages are sent from Spring Integration Channel. Once this Channel is configured, you can use this Channel to send messages to Kafka. (MessageChannel class).


1.int:channel is to configure Spring Integration Channel, which is based on queue.
2.int-kafka: outbound-channel-adapter is the outbound-channel-adapter object, which uses a thread pool to process messages internally. The key is kafka-producer-context-ref.
3.int-kafka: producer-context configures the list of producers and the topics to be processed. These Producers will eventually be converted into Kafka Producers.
4. Task: executor task queue configuration:


BROKER's global configuration
------------------------------------------- System related----- --------------------------------------
##The unique identifier of each broker in the cluster, which is required to be a positive number. Changing the IP address without changing broker.id will not affect consumers
broker.id = 1

##The storage address of kafka data, if multiple addresses are separated by commas /tmp/kafka-logs-1, /tmp/kafka-logs-2
log.dirs = /tmp/kafka-logs

##The port provided to the client to respond to
port = 6667

##The maximum size of the message body, in bytes
message.max.bytes = 1000000

## Broker The maximum number of threads to process messages, under normal circumstances do not need to be modified
num.network.threads = 3

## The number of threads that the broker handles disk IO, the value should be greater than the number of your hard disks
num.io.threads = 8

## The number of threads processed by some background tasks, such as the deletion of expired message files, etc., generally do not need to be modified
background.threads = 4

## The maximum number of request queues waiting for IO thread processing. If the number of requests waiting for IO exceeds this value, it will stop accepting external messages, which is a self-protection mechanism
queued.max.requests = 500

##broker's host address, if it is set, it will be bound to this address, if not, it will be bound to all interfaces, and one of them will be sent to ZK, generally not set
host.name

## Advertising address, if set, will be provided to producers, consumers, other brokers to connect, how to use it has not been studied in depth
advertised.host.name

## Advertising address port, must be different from the setting in port
advertised.port

## Socket's send buffer, socket's tuning parameter SO_SNDBUFF
socket.send.buffer.bytes = 100 * 1024

## Socket's accept buffer, socket's tuning parameter SO_RCVBUFF
socket.receive.buffer.bytes = 100 * 1024

## The maximum value of socket request, to prevent serverOOM, message.max.bytes must be less than socket.request.max.bytes, which will be overwritten by the specified parameters when the topic is created
socket.request.max.bytes = 100 * 1024 * 1024

------------------------------------------- LOG related ----- --------------------------------------
## The partition of the topic is stored in a bunch of segment files. This controls the size of each segment and will be overwritten by the specified parameters when the topic is created.
log.segment.bytes = 1024 * 1024 * 1024

## This parameter will force a new segment if the log segment does not reach the size set by log.segment.bytes, and it will be overwritten by the specified parameters when the topic is created.
log.roll.hours = 24*7

## Log cleaning strategy options are: delete and compact are mainly for the processing of expired data, or the limit of the log file reaches the limit, which will be overwritten by the specified parameters when the topic is created
log.cleanup.policy = delete

## If the maximum time of data storage exceeds this time, the data will be processed according to the policy set by log.cleanup.policy, that is, how long the consumer can consume the data
## If either log.retention.bytes or log.retention.minutes meets the requirements, the deletion will be executed, which will be overwritten by the specified parameters when the topic was created
log.retention.minutes=7 days

## topic The maximum file size of each partition, the size limit of a topic = number of partitions * log.retention.bytes . -1 no size limit
## If either log.retention.bytes or log.retention.minutes meets the requirements, the deletion will be executed, which will be overwritten by the specified parameters when the topic was created
log.retention.bytes=-1

## The cycle time of the file size check, whether to punish the policy set in log.cleanup.policy
log.retention.check.interval.ms=5 minutes

## Whether to enable log compression
log.cleaner.enable=false

## Number of threads to run log compaction
log.cleaner.threads =1

## The maximum size of the log to process when compressing
log.cleaner.io.max.bytes.per.second=None

## Cache space for log compression and deduplication, if space allows, the bigger the better
log.cleaner.dedupe.buffer.size=500*1024*1024

## The IO block size used in log cleaning generally does not need to be modified
log.cleaner.io.buffer.size=512*1024

## The expansion factor of the hash table in log cleaning generally does not need to be modified
log.cleaner.io.buffer.load.factor = 0.9

## Check whether to penalize log cleanup interval
log.cleaner.backoff.ms =15000

## Frequency control of log cleaning, larger means more efficient cleaning, and there will be some waste of space, which will be overwritten by the specified parameters when the topic is created
log.cleaner.min.cleanable.ratio=0.5

## For the longest time of compressed log retention, it is also the longest time for the client to consume messages. The difference from log.retention.minutes is that one controls uncompressed data and the other controls compressed data. Will be overridden by the specified parameters when the topic was created
log.cleaner.delete.retention.ms = 1 day

## The index file size limit for segment logs will be overridden by the specified parameters when the topic is created
log.index.size.max.bytes = 10 * 1024 * 1024

## When a fetch operation is performed, a certain amount of space is required to scan the latest offset size. The larger the setting, the faster the scanning speed, but also the better memory. Generally, this parameter does not need to be taken care of.
log.index.interval.bytes = 4096

## The number of messages accumulated before the log file is "synced" to disk
## Because disk IO operation is a slow operation, but also a necessary means of "data reliability"
## So the setting of this parameter requires a necessary trade-off between "data reliability" and "performance".
## If this value is too large, it will cause each "fsync" to take a long time (IO blocking)
## If this value is too small, it will cause more "fsync" times, which also means that the overall client request has a certain delay.
## Physical server failure will result in the loss of messages without fsync.
log.flush.interval.messages=None

## Check whether it needs to be hardened to the hard disk time interval
log.flush.scheduler.interval.ms = 3000

## It is not enough to control the disk writing timing of messages only by interval.
## This parameter is used to control the time interval of "fsync", if the amount of messages has not reached the threshold, but the time interval from the last disk synchronization
## When the threshold is reached, it will also trigger.
log.flush.interval.ms = None

## The time that the file is kept after it is cleared in the index generally does not need to be modified
log.delete.delay.ms = 60000

## Control the time point of the last hard disk, so that data recovery generally does not need to be modified
log.flush.offset.checkpoint.interval.ms =60000

------------------------------------------- TOPIC related ----- --------------------------------------
## Whether to allow automatic creation of topics, if false, you need to create topics through commands
auto.create.topics.enable =true

## A topic, the number of replications in the default partition should not be greater than the number of brokers in the cluster
default.replication.factor =1

## The number of partitions for each topic, if it is not specified when the topic is created, it will be overwritten by the specified parameters when the topic is created
num.partitions = 1

Example --replication-factor 3 --partitions 1 --topic replicated-topic : The name replicated-topic has one partition, and the partition is replicated on three brokers.

------------------------------------------- Replication (Leader, replicas) related -------------------------------------------------------
## When the partition leader communicates with the replicas, the timeout time of the socket
controller.socket.timeout.ms = 30000

## When the partition leader is synchronized with the replicas data, the queue size of the message
controller.message.queue.size=10

## The longest waiting time for replicas to respond to the partition leader. If it exceeds this time, the replicas will be listed in the ISR (in-sync replicas), and they will be considered dead and will not be added to the management.
replica.lag.time.max.ms = 10000

## If the follower is behind and the leader is too much, it will be considered that the follower [or partition relicas] has failed
## Usually, when the follower communicates with the leader, due to network delay or link disconnection, it will always cause the message synchronization lag in replicas
## If there are too many messages after, the leader will think that the follower has a large network delay or limited message throughput, and will migrate the replicas
## to other followers.
## In an environment with a small number of brokers or insufficient network, it is recommended to increase this value.
replica.lag.max.messages = 4000

##Socket timeout between follower and leader
replica.socket.timeout.ms= 30 * 1000

## The socket cache size when the leader replicates
replica.socket.receive.buffer.bytes=64 * 1024

## The maximum size of replicas to obtain data each time
replica.fetch.max.bytes = 1024 * 1024

## The maximum waiting time for communication between replicas and the leader, if it fails, it will retry
replica.fetch.wait.max.ms = 500

## The minimum data size of fetch, if the unsynchronized data in the leader is less than this value, it will block until the condition is met
replica.fetch.min.bytes =1

## The number of threads that the leader replicates, increasing this value will increase the IO of the follower
num.replica.fetchers=1

## How often each replica checks whether the highest water level is cured
replica.high.watermark.checkpoint.interval.ms = 5000

## Whether to allow the controller to close the broker, if set to true, it will close all leaders on this broker and transfer to other brokers
controlled.shutdown.enable = false

## Number of attempts to shutdown the controller
controlled.shutdown.max.retries = 3

## The time interval between each shutdown attempt
controlled.shutdown.retry.backoff.ms = 5000

## Whether to automatically balance the allocation strategy between brokers
auto.leader.rebalance.enable = false

## The unbalanced ratio of the leader, if it exceeds this value, the partition will be rebalanced
leader.imbalance.per.broker.percentage = 10

## Interval to check if the leader is unbalanced
leader.imbalance.check.interval.seconds = 300

## The client keeps the maximum space for offset information
offset.metadata.max.bytes

------------------------------------------- ZooKeeper related ----- --------------------------------------
##The address of the zookeeper cluster, which can be multiple, separated by commas hostname1:port1,hostname2:port2,hostname3:port3
zookeeper.connect = localhost:2181

## The maximum timeout time of ZooKeeper is the heartbeat interval. If it is not reflected, it is considered dead, and it is not easy to be too large.
zookeeper.session.timeout.ms=6000

## ZooKeeper connection timeout
zookeeper.connection.timeout.ms = 6000

## The synchronization between the leader and the follower in the ZooKeeper cluster is actually that
zookeeper.sync.time.ms = 2000
configuration modification
Some of the configuration can be replaced by each topic's own configuration, for example
Add configuration
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1

Change setting
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config max.message.bytes=128000

Delete configuration:
bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --deleteConfig max.message.bytes



CONSUMER configuration
## The group ID to which the consumer belongs. It is very important for the broker to determine whether it is a queue mode or a publish-subscribe mode according to group.id.
 group.id

## Consumer ID, if not set, it will be incremented automatically
 consumer.id

## An ID for tracking surveys, preferably the same as group.id
 client.id = group id value

## For the specification of the zookeeper cluster, it can be multiple hostname1:port1,hostname2:port2,hostname3:port3 must use the same zk configuration as the broker
 zookeeper.connect=localhost:2182

## Zookeeper's heartbeat timeout time, after checking this time, it is considered to be a dead consumer
 zookeeper.session.timeout.ms = 6000

## zookeeper waiting time for connection
 zookeeper.connection.timeout.ms = 6000

## The synchronization time between the follower of zookeeper and the leader
 zookeeper.sync.time.ms = 2000

## How to handle when there is no initial offset in zookeeper. smallest : reset to the smallest value largest: reset to the largest value anything else: throw an exception
 auto.offset.reset = largest

## The timeout of socket, the actual timeout is: max.fetch.wait + socket.timeout.ms.
 socket.timeout.ms= 30 * 1000

## The size of the socket's receive buffer space
 socket.receive.buffer.bytes=64 * 1024

## message size limit fetched from each partition
 fetch.message.max.bytes = 1024 * 1024

## Whether to synchronize the offset to zookeeper after consuming the message, when the consumer fails, the latest offset can be obtained from zookeeper
 auto.commit.enable = true

## Autocommit interval
 auto.commit.interval.ms = 60 * 1000

## The block used to process consumption messages, each block can be equal to the value in fetch.message.max.bytes
 queued.max.message.chunks = 10

## When a new consumer is added to the group, it will be reblanced, and then the consumers of partitions will be migrated to the new
## On the consumer, if a consumer obtains the consumption permission of a partition, it will register with zk
## "Partition Owner registry" node information, but it is possible that the old consumer has not released this node at this time,
## This value is used to control the number of retries to register the node.
 rebalance.max.retries = 4

## The time interval for each rebalance
 rebalance.backoff.ms = 2000

## Each time the leader is re-elected
 refresh.leader.backoff.ms

## The minimum data sent by the server to the consumer, if it does not meet this value, it will wait until it meets the numerical requirements
 fetch.min.bytes = 1

## If the minimum size (fetch.min.bytes) is not met, the longest waiting time for the consumer request
 fetch.wait.max.ms = 100

## Throws an exception if no message arrives within the specified time, generally does not need to be changed
 consumer.timeout.ms = -1



PRODUCER CONFIGURATION
## The address of the consumer to obtain the message meta information (topics, partitions and replicas), the configuration format is: host1:port1, host2:port2, you can also set a vip outside
 metadata.broker.list

##Message confirmation mode
 ## 0: The arrival confirmation of the message is not guaranteed, just send, low latency but there will be message loss, in the case of a server failure, a bit like TCP
 ## 1: Send a message and wait for the leader to receive confirmation, a certain reliability
 ## -1: Send a message, wait for the leader to receive confirmation and perform a replication operation before returning, the highest reliability
 request.required.acks = 0

## Maximum wait time for message sending
 request.timeout.ms = 10000

## socket buffer size
 send.buffer.bytes=100*1024

## The serialization method of the key, if not set, the same as serializer.class
 key.serializer.class

## partition strategy, the default is modulo
 partitioner.class=kafka.producer.DefaultPartitioner

## The compression mode of the message, the default is none, can have gzip and snappy
 compression.codec = none

## Can be compressed for a specific topic by default
 compressed.topics=null

## The number of retries after the message fails to be sent
 message.send.max.retries = 3

## Interval after each failure
 retry.backoff.ms = 100

## The interval at which the producer regularly updates the topic metadata. If it is set to 0, the data will be updated after each message is sent.
 topic.metadata.refresh.interval.ms = 600 * 1000

## The user can specify at will, but it cannot be repeated. It is mainly used to track and record messages
 client.id=""

------------------------------------------- Message mode related---- ---------------------------------------
 ## Producer type async: send messages asynchronously sync: send messages synchronously
 producer.type=sync

## In asynchronous mode, then the message will be cached at the set time and sent at one time
 queue.buffering.max.ms = 5000

## The longest number of messages to wait in asynchronous mode
 queue.buffering.max.messages = 10000

## In asynchronous mode, if the waiting time for entering the queue is set to 0, then either enter the queue or directly discard
 queue.enqueue.timeout.ms = -1

## In asynchronous mode, the maximum number of messages sent each time, provided that the limit of queue.buffering.max.messages or queue.buffering.max.ms is triggered
 batch.num.messages=200

## Serialized processing class of message body, converted into byte stream for transmission
 serializer.class = kafka.serializer.DefaultEncoder






Reference text (configuration): http://kafka.apache.org/documentation.html#producerconfigs
Reference text: http://blog.csdn.net/huanggang028/article/details/47830529
Reference text: http://blog. csdn.net/zxae86/article/details/47069409
Reference text: http://www.inter12.org/archives/842
Reference text: http://www.aboutyun.com/thread-10322-1-1.html

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326974630&siteId=291194637