Use spring-integration-kafka to send messages
1. Outbound Channel Adapter is used to send messages to Kafka.
2. Messages are sent from Spring Integration Channel. Once this Channel is configured, you can use this Channel to send messages to Kafka. (MessageChannel class).
1.int:channel is to configure Spring Integration Channel, which is based on queue.
2.int-kafka: outbound-channel-adapter is the outbound-channel-adapter object, which uses a thread pool to process messages internally. The key is kafka-producer-context-ref.
3.int-kafka: producer-context configures the list of producers and the topics to be processed. These Producers will eventually be converted into Kafka Producers.
4. Task: executor task queue configuration:
BROKER's global configuration
------------------------------------------- System related----- -------------------------------------- ##The unique identifier of each broker in the cluster, which is required to be a positive number. Changing the IP address without changing broker.id will not affect consumers broker.id = 1 ##The storage address of kafka data, if multiple addresses are separated by commas /tmp/kafka-logs-1, /tmp/kafka-logs-2 log.dirs = /tmp/kafka-logs ##The port provided to the client to respond to port = 6667 ##The maximum size of the message body, in bytes message.max.bytes = 1000000 ## Broker The maximum number of threads to process messages, under normal circumstances do not need to be modified num.network.threads = 3 ## The number of threads that the broker handles disk IO, the value should be greater than the number of your hard disks num.io.threads = 8 ## The number of threads processed by some background tasks, such as the deletion of expired message files, etc., generally do not need to be modified background.threads = 4 ## The maximum number of request queues waiting for IO thread processing. If the number of requests waiting for IO exceeds this value, it will stop accepting external messages, which is a self-protection mechanism queued.max.requests = 500 ##broker's host address, if it is set, it will be bound to this address, if not, it will be bound to all interfaces, and one of them will be sent to ZK, generally not set host.name ## Advertising address, if set, will be provided to producers, consumers, other brokers to connect, how to use it has not been studied in depth advertised.host.name ## Advertising address port, must be different from the setting in port advertised.port ## Socket's send buffer, socket's tuning parameter SO_SNDBUFF socket.send.buffer.bytes = 100 * 1024 ## Socket's accept buffer, socket's tuning parameter SO_RCVBUFF socket.receive.buffer.bytes = 100 * 1024 ## The maximum value of socket request, to prevent serverOOM, message.max.bytes must be less than socket.request.max.bytes, which will be overwritten by the specified parameters when the topic is created socket.request.max.bytes = 100 * 1024 * 1024 ------------------------------------------- LOG related ----- -------------------------------------- ## The partition of the topic is stored in a bunch of segment files. This controls the size of each segment and will be overwritten by the specified parameters when the topic is created. log.segment.bytes = 1024 * 1024 * 1024 ## This parameter will force a new segment if the log segment does not reach the size set by log.segment.bytes, and it will be overwritten by the specified parameters when the topic is created. log.roll.hours = 24*7 ## Log cleaning strategy options are: delete and compact are mainly for the processing of expired data, or the limit of the log file reaches the limit, which will be overwritten by the specified parameters when the topic is created log.cleanup.policy = delete ## If the maximum time of data storage exceeds this time, the data will be processed according to the policy set by log.cleanup.policy, that is, how long the consumer can consume the data ## If either log.retention.bytes or log.retention.minutes meets the requirements, the deletion will be executed, which will be overwritten by the specified parameters when the topic was created log.retention.minutes=7 days ## topic The maximum file size of each partition, the size limit of a topic = number of partitions * log.retention.bytes . -1 no size limit ## If either log.retention.bytes or log.retention.minutes meets the requirements, the deletion will be executed, which will be overwritten by the specified parameters when the topic was created log.retention.bytes=-1 ## The cycle time of the file size check, whether to punish the policy set in log.cleanup.policy log.retention.check.interval.ms=5 minutes ## Whether to enable log compression log.cleaner.enable=false ## Number of threads to run log compaction log.cleaner.threads =1 ## The maximum size of the log to process when compressing log.cleaner.io.max.bytes.per.second=None ## Cache space for log compression and deduplication, if space allows, the bigger the better log.cleaner.dedupe.buffer.size=500*1024*1024 ## The IO block size used in log cleaning generally does not need to be modified log.cleaner.io.buffer.size=512*1024 ## The expansion factor of the hash table in log cleaning generally does not need to be modified log.cleaner.io.buffer.load.factor = 0.9 ## Check whether to penalize log cleanup interval log.cleaner.backoff.ms =15000 ## Frequency control of log cleaning, larger means more efficient cleaning, and there will be some waste of space, which will be overwritten by the specified parameters when the topic is created log.cleaner.min.cleanable.ratio=0.5 ## For the longest time of compressed log retention, it is also the longest time for the client to consume messages. The difference from log.retention.minutes is that one controls uncompressed data and the other controls compressed data. Will be overridden by the specified parameters when the topic was created log.cleaner.delete.retention.ms = 1 day ## The index file size limit for segment logs will be overridden by the specified parameters when the topic is created log.index.size.max.bytes = 10 * 1024 * 1024 ## When a fetch operation is performed, a certain amount of space is required to scan the latest offset size. The larger the setting, the faster the scanning speed, but also the better memory. Generally, this parameter does not need to be taken care of. log.index.interval.bytes = 4096 ## The number of messages accumulated before the log file is "synced" to disk ## Because disk IO operation is a slow operation, but also a necessary means of "data reliability" ## So the setting of this parameter requires a necessary trade-off between "data reliability" and "performance". ## If this value is too large, it will cause each "fsync" to take a long time (IO blocking) ## If this value is too small, it will cause more "fsync" times, which also means that the overall client request has a certain delay. ## Physical server failure will result in the loss of messages without fsync. log.flush.interval.messages=None ## Check whether it needs to be hardened to the hard disk time interval log.flush.scheduler.interval.ms = 3000 ## It is not enough to control the disk writing timing of messages only by interval. ## This parameter is used to control the time interval of "fsync", if the amount of messages has not reached the threshold, but the time interval from the last disk synchronization ## When the threshold is reached, it will also trigger. log.flush.interval.ms = None ## The time that the file is kept after it is cleared in the index generally does not need to be modified log.delete.delay.ms = 60000 ## Control the time point of the last hard disk, so that data recovery generally does not need to be modified log.flush.offset.checkpoint.interval.ms =60000 ------------------------------------------- TOPIC related ----- -------------------------------------- ## Whether to allow automatic creation of topics, if false, you need to create topics through commands auto.create.topics.enable =true ## A topic, the number of replications in the default partition should not be greater than the number of brokers in the cluster default.replication.factor =1 ## The number of partitions for each topic, if it is not specified when the topic is created, it will be overwritten by the specified parameters when the topic is created num.partitions = 1 Example --replication-factor 3 --partitions 1 --topic replicated-topic : The name replicated-topic has one partition, and the partition is replicated on three brokers. ------------------------------------------- Replication (Leader, replicas) related ------------------------------------------------------- ## When the partition leader communicates with the replicas, the timeout time of the socket controller.socket.timeout.ms = 30000 ## When the partition leader is synchronized with the replicas data, the queue size of the message controller.message.queue.size=10 ## The longest waiting time for replicas to respond to the partition leader. If it exceeds this time, the replicas will be listed in the ISR (in-sync replicas), and they will be considered dead and will not be added to the management. replica.lag.time.max.ms = 10000 ## If the follower is behind and the leader is too much, it will be considered that the follower [or partition relicas] has failed ## Usually, when the follower communicates with the leader, due to network delay or link disconnection, it will always cause the message synchronization lag in replicas ## If there are too many messages after, the leader will think that the follower has a large network delay or limited message throughput, and will migrate the replicas ## to other followers. ## In an environment with a small number of brokers or insufficient network, it is recommended to increase this value. replica.lag.max.messages = 4000 ##Socket timeout between follower and leader replica.socket.timeout.ms= 30 * 1000 ## The socket cache size when the leader replicates replica.socket.receive.buffer.bytes=64 * 1024 ## The maximum size of replicas to obtain data each time replica.fetch.max.bytes = 1024 * 1024 ## The maximum waiting time for communication between replicas and the leader, if it fails, it will retry replica.fetch.wait.max.ms = 500 ## The minimum data size of fetch, if the unsynchronized data in the leader is less than this value, it will block until the condition is met replica.fetch.min.bytes =1 ## The number of threads that the leader replicates, increasing this value will increase the IO of the follower num.replica.fetchers=1 ## How often each replica checks whether the highest water level is cured replica.high.watermark.checkpoint.interval.ms = 5000 ## Whether to allow the controller to close the broker, if set to true, it will close all leaders on this broker and transfer to other brokers controlled.shutdown.enable = false ## Number of attempts to shutdown the controller controlled.shutdown.max.retries = 3 ## The time interval between each shutdown attempt controlled.shutdown.retry.backoff.ms = 5000 ## Whether to automatically balance the allocation strategy between brokers auto.leader.rebalance.enable = false ## The unbalanced ratio of the leader, if it exceeds this value, the partition will be rebalanced leader.imbalance.per.broker.percentage = 10 ## Interval to check if the leader is unbalanced leader.imbalance.check.interval.seconds = 300 ## The client keeps the maximum space for offset information offset.metadata.max.bytes ------------------------------------------- ZooKeeper related ----- -------------------------------------- ##The address of the zookeeper cluster, which can be multiple, separated by commas hostname1:port1,hostname2:port2,hostname3:port3 zookeeper.connect = localhost:2181 ## The maximum timeout time of ZooKeeper is the heartbeat interval. If it is not reflected, it is considered dead, and it is not easy to be too large. zookeeper.session.timeout.ms=6000 ## ZooKeeper connection timeout zookeeper.connection.timeout.ms = 6000 ## The synchronization between the leader and the follower in the ZooKeeper cluster is actually that zookeeper.sync.time.ms = 2000 configuration modification Some of the configuration can be replaced by each topic's own configuration, for example Add configuration bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config max.message.bytes=64000 --config flush.messages=1 Change setting bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --config max.message.bytes=128000 Delete configuration: bin/kafka-topics.sh --zookeeper localhost:2181 --alter --topic my-topic --deleteConfig max.message.bytes
CONSUMER configuration
## The group ID to which the consumer belongs. It is very important for the broker to determine whether it is a queue mode or a publish-subscribe mode according to group.id. group.id ## Consumer ID, if not set, it will be incremented automatically consumer.id ## An ID for tracking surveys, preferably the same as group.id client.id = group id value ## For the specification of the zookeeper cluster, it can be multiple hostname1:port1,hostname2:port2,hostname3:port3 must use the same zk configuration as the broker zookeeper.connect=localhost:2182 ## Zookeeper's heartbeat timeout time, after checking this time, it is considered to be a dead consumer zookeeper.session.timeout.ms = 6000 ## zookeeper waiting time for connection zookeeper.connection.timeout.ms = 6000 ## The synchronization time between the follower of zookeeper and the leader zookeeper.sync.time.ms = 2000 ## How to handle when there is no initial offset in zookeeper. smallest : reset to the smallest value largest: reset to the largest value anything else: throw an exception auto.offset.reset = largest ## The timeout of socket, the actual timeout is: max.fetch.wait + socket.timeout.ms. socket.timeout.ms= 30 * 1000 ## The size of the socket's receive buffer space socket.receive.buffer.bytes=64 * 1024 ## message size limit fetched from each partition fetch.message.max.bytes = 1024 * 1024 ## Whether to synchronize the offset to zookeeper after consuming the message, when the consumer fails, the latest offset can be obtained from zookeeper auto.commit.enable = true ## Autocommit interval auto.commit.interval.ms = 60 * 1000 ## The block used to process consumption messages, each block can be equal to the value in fetch.message.max.bytes queued.max.message.chunks = 10 ## When a new consumer is added to the group, it will be reblanced, and then the consumers of partitions will be migrated to the new ## On the consumer, if a consumer obtains the consumption permission of a partition, it will register with zk ## "Partition Owner registry" node information, but it is possible that the old consumer has not released this node at this time, ## This value is used to control the number of retries to register the node. rebalance.max.retries = 4 ## The time interval for each rebalance rebalance.backoff.ms = 2000 ## Each time the leader is re-elected refresh.leader.backoff.ms ## The minimum data sent by the server to the consumer, if it does not meet this value, it will wait until it meets the numerical requirements fetch.min.bytes = 1 ## If the minimum size (fetch.min.bytes) is not met, the longest waiting time for the consumer request fetch.wait.max.ms = 100 ## Throws an exception if no message arrives within the specified time, generally does not need to be changed consumer.timeout.ms = -1
PRODUCER CONFIGURATION
## The address of the consumer to obtain the message meta information (topics, partitions and replicas), the configuration format is: host1:port1, host2:port2, you can also set a vip outside metadata.broker.list ##Message confirmation mode ## 0: The arrival confirmation of the message is not guaranteed, just send, low latency but there will be message loss, in the case of a server failure, a bit like TCP ## 1: Send a message and wait for the leader to receive confirmation, a certain reliability ## -1: Send a message, wait for the leader to receive confirmation and perform a replication operation before returning, the highest reliability request.required.acks = 0 ## Maximum wait time for message sending request.timeout.ms = 10000 ## socket buffer size send.buffer.bytes=100*1024 ## The serialization method of the key, if not set, the same as serializer.class key.serializer.class ## partition strategy, the default is modulo partitioner.class=kafka.producer.DefaultPartitioner ## The compression mode of the message, the default is none, can have gzip and snappy compression.codec = none ## Can be compressed for a specific topic by default compressed.topics=null ## The number of retries after the message fails to be sent message.send.max.retries = 3 ## Interval after each failure retry.backoff.ms = 100 ## The interval at which the producer regularly updates the topic metadata. If it is set to 0, the data will be updated after each message is sent. topic.metadata.refresh.interval.ms = 600 * 1000 ## The user can specify at will, but it cannot be repeated. It is mainly used to track and record messages client.id="" ------------------------------------------- Message mode related---- --------------------------------------- ## Producer type async: send messages asynchronously sync: send messages synchronously producer.type=sync ## In asynchronous mode, then the message will be cached at the set time and sent at one time queue.buffering.max.ms = 5000 ## The longest number of messages to wait in asynchronous mode queue.buffering.max.messages = 10000 ## In asynchronous mode, if the waiting time for entering the queue is set to 0, then either enter the queue or directly discard queue.enqueue.timeout.ms = -1 ## In asynchronous mode, the maximum number of messages sent each time, provided that the limit of queue.buffering.max.messages or queue.buffering.max.ms is triggered batch.num.messages=200 ## Serialized processing class of message body, converted into byte stream for transmission serializer.class = kafka.serializer.DefaultEncoder
Reference text (configuration): http://kafka.apache.org/documentation.html#producerconfigs
Reference text: http://blog.csdn.net/huanggang028/article/details/47830529
Reference text: http://blog. csdn.net/zxae86/article/details/47069409
Reference text: http://www.inter12.org/archives/842
Reference text: http://www.aboutyun.com/thread-10322-1-1.html