1. Configure and run the kafka server
1. Set up the zookeeper environment before running the kafka server
This step is omitted, please refer to http://code727.iteye.com/blog/2360944
2. Configure the broker in server.properties
# The unique identifier of the current machine in the cluster, which is the same as zookeeper's myid broker.id=0 # The current port for kafka to provide external services, the default is 9092 port=9092 # This parameter is disabled by default. In 0.8.1, there are bugs, DNS resolution problems, and failure rate problems. # host.name=192.168.1.100 # The number of threads for the borker node to perform network processing num.network.threads=3 # Borker number of threads for I/O processing num.io.threads=8 # The number of requests queued for IO thread execution, the default is 500 queued.max.requests=128 # The directory where messages are stored, this directory can be configured as a comma-separated expression, and num.io.threads must be greater than the number of this directory # If multiple directories are configured, the newly created topic will persist messages to the directory with the least number of partitions log.dirs=/kafka/9092/logs/ # The size of the buffer used to send the message, the data is not sent immediately, it will be stored in the buffer first, and then sent when it reaches a certain size, which can improve performance socket.send.buffer.bytes=102400 # The size of the buffer used to receive messages, when the data reaches a certain size and then serialized to disk socket.receive.buffer.bytes=102400 # The maximum number of messages requested from kafka or sent to kafka cannot exceed the stack size of java socket.request.max.bytes=104857600 # Whether to let the program automatically create Topic, the default is true, it is recommended to be false auto.create.topics.enable=true # The default number of partitions, each topic has 1 partition number by default, and it is recommended that the number of partitions for a single topic should not exceed 64 # For example, the partition of test topic has 8 folders in the log.dirs directory: test-0, test-1..., test7 num.partitions=8 # Maximum length of a single message message.max.bytes=5242880 # The number of message backups is 1 by default and will not be copied. It is recommended to modify # When N>1, if one copy fails, N-1 copies can continue to provide services default.replication.factor=2 # The maximum number of bytes to grab the message replica.fetch.max.bytes=5242880 # Because kafka's messages are written to files in the form of appends, when the size of a single file exceeds this value, a new file will be created to store log.segment.bytes=1073741824 # The maximum persistence time of the message, 168 hours, 7 days log.retention.hours=168 # How many milliseconds to check the log expiration time configured by log.retention.hours, if there is an expired message, delete it log.retention.check.interval.ms=300000 # Whether to enable log compression, generally do not need to enable, enabling it can improve performance log.cleaner.enable=false # In order to ensure consistent services, it is necessary to configure the address and port number of external services in the zookeeper cluster environment. Multiple comma-separated zookeeper.connect=127.0.0.1:2181,127.0.0.1:2182,127.0.0.1:2183
3. Start the broker
# linux command line, start in daemon mode, load the specified configuration file ./kafka-server-start.sh -daemon ../config/server.properties # Windows command line kafka-server-start ../config/server.properties
4. Run the producer test
# linux command line, connect to the test topic on 9092 borker bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test # Windows command line bin/windows/kafka-console-producer --broker-list localhost:9092 --topic test
5. Run the consumer test
# Linux command line, point to the zookeeper node connected to the producer, and monitor the test topic bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning # Windows command line bin/windows/kafka-console-consumer --zookeeper localhost:2181 --topic test --from-beginning
6. Enter a string on the producer side and press Enter, and the display on the consumer side means success.
7. Manually create a Topic partition
When the configuration item auto.create.topics.enable in server.properties is false, it means that we need to create a partition for a topic by ourselves
# Linux command line, create a topic with a number of backups and partitions of 3 and 8, and the name is test bin/kafka-topics --create --zookeeper localhost:2181,localhost:2182,localhost:2183 --replication-factor 3 --partitions 8 --topic test # Windows command line bin/windows/kafka-topics --create --zookeeper localhost:2181,localhost:2182,localhost:2183 --replication-factor 3 --partitions 8 --topic test
2. About Producer and Consumer configuration
1. Producer configuration
Instructions can be found in org.apache.kafka.clients.producer.ProducerConfig
# The host and port of kafka's external services, in a cluster environment, multiple service addresses are separated by commas bootstrap.servers=127.0.0.1:9092,127.0.0.1:9093 #Strategies that require the broker to confirm the ACK response to the producer after producing a message #0: The production is considered successful without waiting for an ACK reply from any broker node. In this case, the retries configuration will no longer take effect, and the record offset (offset) obtained by the producer is always -1 #1: The default value, just wait for the leader node to respond with ACK, then the production is considered successful #all: Wait for all nodes to respond with ACK before considering the production is successful acks=1 # Batch delay time upper limit, unit: milliseconds, default is 0 (no delay) # Usually happens under load, when records arrive faster than they can be sent, artificially increase latency by adding this config, eg: linger.ms=5 will have the effect of reducing the number of requests sent, but the records sent A total of up to 5ms delay linger.ms=5 # Producer's unique identifier # The purpose is to be able to track the origin of the ip/port request by allowing a logical application name to be included in the server-side request record. client.id=producer-0 # The size of the TCP send buffer (SO_SNDBUF) used when sending data. If the value is -1, the OS default will be used send.buffer.bytes=1024 # The size of the TCP receive buffer (SO_RCVBUF) used when reading data. If the value is -1, the operating system default will be used. Personally think that the producer receives data mainly from some broker's ACK reply receive.buffer.bytes=1024 # The maximum number of bytes in a single request, mainly to avoid sending huge requests max.request.size=1024 # Time to wait before trying to reconnect to the specified host, in milliseconds # Mainly to avoid the problem of instantaneous accumulation of connections after the host fails in a concurrent environment reconnect.backoff.ms=3000 # Since the send and partitionsFor methods may be blocked because the buffer is full or metadata is unavailable, this configuration will be used to control how long (milliseconds) to block, and will be released when it exceeds this value and has not returned, throw out TimeoutException # Can be used to replace the deprecated metadata.fetch.timeout.ms and block.on.buffer.full configurations max.block.ms=3000 # Time to wait before trying to retry a failed request to the specified topic, in milliseconds # This avoids repeated requests in intensively in some failure cases. retry.backoff.ms=3000 # The total number of bytes of memory used by the producer to buffer records waiting to be sent to the server # When the sending speed is faster than the delivery speed to the server, it will block max.block.ms unit time and throw an exception, so this configuration is needed to control the sending speed and reduce the occurrence of exceptions # Typically, this setting should roughly correspond to the total memory the producer will use buffer.memory=33554432(32768MB=32G) # The compression type of all data generated by the producer # none: no compression, default # The rest are gzip, gzip and lz4, the compression is a complete batch of data, so the effect of batching will also affect the compression ratio (more batches means better compression) compression.type=none # Calculate the time window of metric samples, unit: milliseconds metrics.sample.window.ms=3000 # maintain the number of samples for calculating the metric metrics.num.samples=3 # List of classes to use as metric reporters, implementing the org.apache.kafka.common.metrics.MetricsReporter interface, JmxReporter being one of the implementations, registering JMX statistics. metric.reporters=org.apache.kafka.common.metrics.JmxReporter # The maximum number of unacknowledged requests sent on a single connection before the client is blocked # If set to greater than 1 and sending fails, there are messages reordered due to retries (if retries are enabled). max.in.flight.requests.per.connection=1 # The number of retries after sending the message failed # If the number of retries is set, and max.in.flight.requests.per.connection is set to 1, it will potentially change the ordering of records, because if two batches of messages are sent to a single partition and the first fails and try again, but the second one succeeds, the records in the second batch may be queued first. retries=3 # Make a message key serialization implementation class key.serializer=org.apache.kafka.common.serialization.StringSerializer # Specify the message value serialization implementation class value.serializer=org.apache.kafka.common.serialization.StringSerializer # Close the idle connection after the specified number of milliseconds, that is, the maximum hold time of the idle connection connections.max.idle.ms=60000 # Implementation class of partition interface (org.apache.kafka.clients.producer.Partitioner) partitioner.class=org.apache.kafka.clients.producer.internals.DefaultPartitioner # The time the producer waits for a response after sending a request, in milliseconds # If there is no response after the timeout, the producer will send a retry request every timeout period within the number of retries until it gets a response or the retries are exhausted # Can be used to replace the deprecated timeout.ms configuration request.timeout.ms=50002.Consumer configuration
Instructions can be found in org.apache.kafka.clients.consumer.ConsumerConfig
# Specify on which kafka servers the consumer will consume, and multiple addresses in the cluster environment are separated by commas bootstrap.servers=127.0.0.1:9092,127.0.0.1:9093 # Unique identifier of the group to which the consumer belongs # This property is required if the consumer uses the group management feature by using subscribe or Kafka-based offset management strategy # When there are multiple identical consumers (listening), if their group.id is the same, the message can only be consumed by one member of the group. In the consumer cluster environment, the problem of repeated consumption will be avoided. Pay attention to this group.id=consumer_group_0 # Maximum number of records returned in each call to poll method max.poll.records=10 # The time interval between consecutive calls to the poll method, in milliseconds max.poll.interval.ms=1500 # Timeout timeout for detecting consumer failure, in milliseconds # The consumer will periodically send heartbeats to indicate its vitality. If the broker does not receive a heartbeat before this session times out, the broker will remove the consumer from the group and recalculate the load balancing # group.min.session.timeout.ms >= session.timeout.ms >= group.max.session.timeout.ms session.timeout.ms=60000 # The interval between heartbeats, in milliseconds # Used to ensure that the consumer's session remains active and triggers recalculation of load balancing when new consumers join or leave the group # This value must be less than session.timeout.ms, but should usually be set no higher than 1/3 of this value, and can be adjusted lower to control the expected time to recalculate the balance heartbeat.interval.ms=6000 # If true, the consumer's offset will be periodically submitted in the background (offset) enable.auto.commit=true # If enable.auto.commit is set to true, the consumer will automatically commit an offset to the broker at this interval auto.commit.interval.ms=5000 # The class name of the partition allocation strategy # Client will be used to distribute partition ownership among consumer instances when using group management partition.assignment.strategy= # What to do when Kafka has no initial offset or if the current offset no longer exists on the server # earliest: automatically reset the offset to the earliest offset, which may lead to double consumption # latest: automatically reset the offset to the latest offset, which may result in the loss of unconsumed messages # none: If no previous offset is found for the consumer group, throw an exception to the consumer auto.offset.reset=none # The minimum amount of data (bytes) returned by the server for a fetch request # If there is insufficient data, how much data will the request wait to accumulate before answering the request # The default setting of 1 byte means that the read request will be answered as long as a single byte of data is available or the read request times out waiting for data to arrive # Setting this value greater than 1 will cause the server to wait for a large amount of data to accumulate, which may result in Increased server throughput at the expense of some extra latency fetch.min.bytes=1 # The maximum amount of data (bytes) returned by the server for a grab request, the default is 52428800 (500G) # This is not an absolute maximum, if the first message in the first non-empty partition fetched is larger than this value, the message will still be returned to ensure that the consumer can get the result fetch.max.bytes=52428800 # The maximum time the server will block before answering a fetch request if there is not enough data to immediately fetch the number of bytes specified by fetch.min.bytes fetch.max.wait.ms=3000 # Maximum lifetime of metadata # After this time, even if there is no partition leader, Bion will actively discover any new brokers or partitions, and force the metadata to refresh metadata.max.age.ms=60000 # The server returns the maximum amount of data for each partition, the default value is 1048576 (1G) # If the first message in the first non-empty partition fetched is greater than this value, the message will still be returned to ensure that the consumer can get the result # This value cannot exceed the broker's message.max.bytes, topic's max.message.bytes, and fetch.max.bytes max.partition.fetch.bytes=1048576 # Same as the producer's configuration of the same name send.buffer.bytes=1024 # Same as the producer's configuration of the same name receive.buffer.bytes=1024 # Same as the producer's configuration of the same name client.id=consumer-0 # Same as the producer's configuration of the same name reconnect.backoff.ms # Same as the producer's configuration of the same name retry.backoff.ms # Same as the producer's configuration of the same name metrics.sample.window.ms # Same as the producer's configuration of the same name metrics.num.samples # Same as the producer's configuration of the same name metric.reporters # Automatically check the CRC32 (error detection algorithm) of the consumed records, this ensures that no disk corruption of the message has occurred # This check adds some overhead, so it may be disabled if extreme performance is sought check.crcs=false # Deserializer implementation class for message keys key.deserializer=org.apache.kafka.common.serialization.StringDeserializer # Message worth deserializer implementation class value.deserializer=org.apache.kafka.common.serialization.StringDeserializer # Same as the producer's configuration of the same name connections.max.idle.ms=600000 # Same as the producer's configuration of the same name request.timeout.ms=3000 # Intercept the implementation class for consumption # This implementation class needs to implement the org.apache.kafka.clients.consumer.ConsumerInterceptor interface interceptor.classes=org.apache.kafka.clients.consumer.ConsumerInterceptor # Whether information from internal topics (eg offsets) should be displayed to consumers # If set to true (default), the only way to receive records from the internal topic is to subscribe exclude.internal.topics=true