Kafka cluster configuration and deployment

1. Configure and run the kafka server

1. Set up the zookeeper environment before running the kafka server

This step is omitted, please refer to http://code727.iteye.com/blog/2360944

2. Configure the broker in server.properties

# The unique identifier of the current machine in the cluster, which is the same as zookeeper's myid
broker.id=0

# The current port for kafka to provide external services, the default is 9092
port=9092

# This parameter is disabled by default. In 0.8.1, there are bugs, DNS resolution problems, and failure rate problems.
# host.name=192.168.1.100

# The number of threads for the borker node to perform network processing
num.network.threads=3

# Borker number of threads for I/O processing
num.io.threads=8

# The number of requests queued for IO thread execution, the default is 500
queued.max.requests=128

# The directory where messages are stored, this directory can be configured as a comma-separated expression, and num.io.threads must be greater than the number of this directory
# If multiple directories are configured, the newly created topic will persist messages to the directory with the least number of partitions
log.dirs=/kafka/9092/logs/

# The size of the buffer used to send the message, the data is not sent immediately, it will be stored in the buffer first, and then sent when it reaches a certain size, which can improve performance
socket.send.buffer.bytes=102400

# The size of the buffer used to receive messages, when the data reaches a certain size and then serialized to disk
socket.receive.buffer.bytes=102400

# The maximum number of messages requested from kafka or sent to kafka cannot exceed the stack size of java
socket.request.max.bytes=104857600

# Whether to let the program automatically create Topic, the default is true, it is recommended to be false
auto.create.topics.enable=true

# The default number of partitions, each topic has 1 partition number by default, and it is recommended that the number of partitions for a single topic should not exceed 64
# For example, the partition of test topic has 8 folders in the log.dirs directory: test-0, test-1..., test7
num.partitions=8

# Maximum length of a single message
message.max.bytes=5242880

# The number of message backups is 1 by default and will not be copied. It is recommended to modify
# When N>1, if one copy fails, N-1 copies can continue to provide services
default.replication.factor=2

# The maximum number of bytes to grab the message
replica.fetch.max.bytes=5242880

# Because kafka's messages are written to files in the form of appends, when the size of a single file exceeds this value, a new file will be created to store
log.segment.bytes=1073741824

# The maximum persistence time of the message, 168 hours, 7 days
log.retention.hours=168

# How many milliseconds to check the log expiration time configured by log.retention.hours, if there is an expired message, delete it
log.retention.check.interval.ms=300000

# Whether to enable log compression, generally do not need to enable, enabling it can improve performance
log.cleaner.enable=false

# In order to ensure consistent services, it is necessary to configure the address and port number of external services in the zookeeper cluster environment. Multiple comma-separated
zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183

 3. Start the broker

# linux command line, start in daemon mode, load the specified configuration file
./kafka-server-start.sh -daemon ../config/server.properties

# Windows command line
kafka-server-start ../config/server.properties

 4. Run the producer test

# linux command line, connect to the test topic on 9092 borker
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test  

# Windows command line
bin/windows/kafka-console-producer --broker-list localhost:9092 --topic test

 5. Run the consumer test

# Linux command line, point to the zookeeper node connected to the producer, and monitor the test topic
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning

# Windows command line
bin/windows/kafka-console-consumer --zookeeper localhost:2181 --topic test --from-beginning

 6. Enter a string on the producer side and press Enter, and the display on the consumer side means success.

7. Manually create a Topic partition

When the configuration item auto.create.topics.enable in server.properties is false, it means that we need to create a partition for a topic by ourselves

# Linux command line, create a topic with a number of backups and partitions of 3 and 8, and the name is test
bin/kafka-topics --create --zookeeper localhost:2181,localhost:2182,localhost:2183 --replication-factor 3 --partitions 8 --topic test

# Windows command line
bin/windows/kafka-topics --create --zookeeper localhost:2181,localhost:2182,localhost:2183 --replication-factor 3 --partitions 8 --topic test

 

2. About Producer and Consumer configuration

1. Producer configuration

Instructions can be found in org.apache.kafka.clients.producer.ProducerConfig

 

# The host and port of kafka's external services, in a cluster environment, multiple service addresses are separated by commas
bootstrap.servers=127.0.0.1:9092,127.0.0.1:9093

#Strategies that require the broker to confirm the ACK response to the producer after producing a message
#0: The production is considered successful without waiting for an ACK reply from any broker node. In this case, the retries configuration will no longer take effect, and the record offset (offset) obtained by the producer is always -1
#1: The default value, just wait for the leader node to respond with ACK, then the production is considered successful
#all: Wait for all nodes to respond with ACK before considering the production is successful
acks=1

# Batch delay time upper limit, unit: milliseconds, default is 0 (no delay)
# Usually happens under load, when records arrive faster than they can be sent, artificially increase latency by adding this config, eg: linger.ms=5 will have the effect of reducing the number of requests sent, but the records sent A total of up to 5ms delay
linger.ms=5

# Producer's unique identifier
# The purpose is to be able to track the origin of the ip/port request by allowing a logical application name to be included in the server-side request record.
client.id=producer-0

# The size of the TCP send buffer (SO_SNDBUF) used when sending data. If the value is -1, the OS default will be used
send.buffer.bytes=1024

# The size of the TCP receive buffer (SO_RCVBUF) used when reading data. If the value is -1, the operating system default will be used. Personally think that the producer receives data mainly from some broker's ACK reply
receive.buffer.bytes=1024

# The maximum number of bytes in a single request, mainly to avoid sending huge requests
max.request.size=1024

# Time to wait before trying to reconnect to the specified host, in milliseconds
# Mainly to avoid the problem of instantaneous accumulation of connections after the host fails in a concurrent environment
reconnect.backoff.ms=3000

# Since the send and partitionsFor methods may be blocked because the buffer is full or metadata is unavailable, this configuration will be used to control how long (milliseconds) to block, and will be released when it exceeds this value and has not returned, throw out TimeoutException
# Can be used to replace the deprecated metadata.fetch.timeout.ms and block.on.buffer.full configurations
max.block.ms=3000

# Time to wait before trying to retry a failed request to the specified topic, in milliseconds
# This avoids repeated requests in intensively in some failure cases.
retry.backoff.ms=3000

# The total number of bytes of memory used by the producer to buffer records waiting to be sent to the server
# When the sending speed is faster than the delivery speed to the server, it will block max.block.ms unit time and throw an exception, so this configuration is needed to control the sending speed and reduce the occurrence of exceptions
# Typically, this setting should roughly correspond to the total memory the producer will use
buffer.memory=33554432(32768MB=32G)

# The compression type of all data generated by the producer
# none: no compression, default
# The rest are gzip, gzip and lz4, the compression is a complete batch of data, so the effect of batching will also affect the compression ratio (more batches means better compression)
compression.type=none

# Calculate the time window of metric samples, unit: milliseconds
metrics.sample.window.ms=3000

# maintain the number of samples for calculating the metric
metrics.num.samples=3

# List of classes to use as metric reporters, implementing the org.apache.kafka.common.metrics.MetricsReporter interface, JmxReporter being one of the implementations, registering JMX statistics.
metric.reporters=org.apache.kafka.common.metrics.JmxReporter

# The maximum number of unacknowledged requests sent on a single connection before the client is blocked
# If set to greater than 1 and sending fails, there are messages reordered due to retries (if retries are enabled).
max.in.flight.requests.per.connection=1

# The number of retries after sending the message failed
# If the number of retries is set, and max.in.flight.requests.per.connection is set to 1, it will potentially change the ordering of records, because if two batches of messages are sent to a single partition and the first fails and try again, but the second one succeeds, the records in the second batch may be queued first.
retries=3

# Make a message key serialization implementation class
key.serializer=org.apache.kafka.common.serialization.StringSerializer

# Specify the message value serialization implementation class
value.serializer=org.apache.kafka.common.serialization.StringSerializer

# Close the idle connection after the specified number of milliseconds, that is, the maximum hold time of the idle connection
connections.max.idle.ms=60000

# Implementation class of partition interface (org.apache.kafka.clients.producer.Partitioner)
partitioner.class=org.apache.kafka.clients.producer.internals.DefaultPartitioner

# The time the producer waits for a response after sending a request, in milliseconds
# If there is no response after the timeout, the producer will send a retry request every timeout period within the number of retries until it gets a response or the retries are exhausted
# Can be used to replace the deprecated timeout.ms configuration
request.timeout.ms=5000
 2.Consumer configuration

 Instructions can be found in org.apache.kafka.clients.consumer.ConsumerConfig

# Specify on which kafka servers the consumer will consume, and multiple addresses in the cluster environment are separated by commas
bootstrap.servers=127.0.0.1:9092,127.0.0.1:9093

# Unique identifier of the group to which the consumer belongs
# This property is required if the consumer uses the group management feature by using subscribe or Kafka-based offset management strategy
# When there are multiple identical consumers (listening), if their group.id is the same, the message can only be consumed by one member of the group. In the consumer cluster environment, the problem of repeated consumption will be avoided. Pay attention to this
group.id=consumer_group_0

# Maximum number of records returned in each call to poll method
max.poll.records=10

# The time interval between consecutive calls to the poll method, in milliseconds
max.poll.interval.ms=1500

# Timeout timeout for detecting consumer failure, in milliseconds
# The consumer will periodically send heartbeats to indicate its vitality. If the broker does not receive a heartbeat before this session times out, the broker will remove the consumer from the group and recalculate the load balancing
# group.min.session.timeout.ms >= session.timeout.ms >= group.max.session.timeout.ms
session.timeout.ms=60000

# The interval between heartbeats, in milliseconds
# Used to ensure that the consumer's session remains active and triggers recalculation of load balancing when new consumers join or leave the group
# This value must be less than session.timeout.ms, but should usually be set no higher than 1/3 of this value, and can be adjusted lower to control the expected time to recalculate the balance
heartbeat.interval.ms=6000

# If true, the consumer's offset will be periodically submitted in the background (offset)
enable.auto.commit=true

# If enable.auto.commit is set to true, the consumer will automatically commit an offset to the broker at this interval
auto.commit.interval.ms=5000

# The class name of the partition allocation strategy
# Client will be used to distribute partition ownership among consumer instances when using group management
partition.assignment.strategy=

# What to do when Kafka has no initial offset or if the current offset no longer exists on the server
# earliest: automatically reset the offset to the earliest offset, which may lead to double consumption
# latest: automatically reset the offset to the latest offset, which may result in the loss of unconsumed messages
# none: If no previous offset is found for the consumer group, throw an exception to the consumer
auto.offset.reset=none

# The minimum amount of data (bytes) returned by the server for a fetch request
# If there is insufficient data, how much data will the request wait to accumulate before answering the request
# The default setting of 1 byte means that the read request will be answered as long as a single byte of data is available or the read request times out waiting for data to arrive # Setting this value greater than 1 will cause the server to wait for a large amount of data to accumulate, which may result in Increased server throughput at the expense of some extra latency
fetch.min.bytes=1

# The maximum amount of data (bytes) returned by the server for a grab request, the default is 52428800 (500G)
# This is not an absolute maximum, if the first message in the first non-empty partition fetched is larger than this value, the message will still be returned to ensure that the consumer can get the result
fetch.max.bytes=52428800

# The maximum time the server will block before answering a fetch request if there is not enough data to immediately fetch the number of bytes specified by fetch.min.bytes
fetch.max.wait.ms=3000

# Maximum lifetime of metadata
# After this time, even if there is no partition leader, Bion will actively discover any new brokers or partitions, and force the metadata to refresh
metadata.max.age.ms=60000

# The server returns the maximum amount of data for each partition, the default value is 1048576 (1G)
# If the first message in the first non-empty partition fetched is greater than this value, the message will still be returned to ensure that the consumer can get the result
# This value cannot exceed the broker's message.max.bytes, topic's max.message.bytes, and fetch.max.bytes
max.partition.fetch.bytes=1048576

# Same as the producer's configuration of the same name
send.buffer.bytes=1024

# Same as the producer's configuration of the same name
receive.buffer.bytes=1024

# Same as the producer's configuration of the same name
client.id=consumer-0
 
# Same as the producer's configuration of the same name
reconnect.backoff.ms
 
# Same as the producer's configuration of the same name
retry.backoff.ms
 
# Same as the producer's configuration of the same name
metrics.sample.window.ms
 
# Same as the producer's configuration of the same name
metrics.num.samples
 
# Same as the producer's configuration of the same name
metric.reporters
 
# Automatically check the CRC32 (error detection algorithm) of the consumed records, this ensures that no disk corruption of the message has occurred
# This check adds some overhead, so it may be disabled if extreme performance is sought
check.crcs=false
 
# Deserializer implementation class for message keys
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
 
# Message worth deserializer implementation class
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
 
# Same as the producer's configuration of the same name
connections.max.idle.ms=600000
 
# Same as the producer's configuration of the same name
request.timeout.ms=3000
 
# Intercept the implementation class for consumption
# This implementation class needs to implement the org.apache.kafka.clients.consumer.ConsumerInterceptor interface
interceptor.classes=org.apache.kafka.clients.consumer.ConsumerInterceptor
 
# Whether information from internal topics (eg offsets) should be displayed to consumers
# If set to true (default), the only way to receive records from the internal topic is to subscribe
exclude.internal.topics=true

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326687287&siteId=291194637