Kafka Series 2: In-depth understanding of Kafka producer

Kafka Series 2: In-depth understanding of consumer Kafka

Part chat overview of Kafka, Kafka contains the basic concepts, design principles and design of the core. Benpian separate talk Kafka producers, including the following:

How is the production of news producers

How to create producer

Send a message to Kafka

Producers Configuration

Partition

How is the production of news producers

First look at Kafka Producer component diagram

The first step, Kafka will send a message packaging ProducerRecord objects, ProducerRecord target object contains the theme and content to be sent, but you can also specify keys and partitions. Before sending ProducerRecord objects, the producer will first key value and the target sequence into a byte array, so that they can be transmitted over the network only. A second step, the data is passed to the partitioner. If you have previously specified partition ProducerRecord objects inside, then the partitioner will not do anything. If no partition, the partition will be selected according to a key object ProducerRecord partition, followed, this record is added to the batch in a record, all messages in the batch will be sent to the same subject and the partition on. There is a separate thread is responsible for these batch records sent to the appropriate broker. Upon receipt of these messages server will return a response. If the message is successfully written to Kafka, it returns a RecordMetaData object that contains the theme and partition information, and records the offset of the partition. If the write fails, it will return an error. Producers will receive an error after attempting to resend the message, if after reaching a specified number of retries has not been successful, direct throw an exception, not retry.

How to create producer

Property set

When an object is created producer, to set some properties, there are three attributes are mandatory:

bootstrap.servers: Broker specified address list, the address format is host: port. It need not be included in the list of all the addresses Broker, producers will look from the inside to the given Broker Broker of other information; however, recommend at least two Broker to provide information to ensure fault tolerance.

key.serializer: serializer specified key. Broker desired keys and values ​​are byte array of the received message. This property must be set to a org.apache.kafka.common.serialization.Serializer class that implements the interface, the producer will use the key sequence to the class into a byte array. Kafka client provides ByteArraySerializer, StringSerializer IntegerSerializer default and therefore generally do not need to implement a custom serializer. Note that, key.serializer property must be set, even if only to send the value of content.

value.serializer: serializer specified value. If the key strings and values ​​are, the same can be used key.serializer serializer otherwise require different serializers.

Project Dependencies

To maven project, for example, to use Kafka client, need to introduce kafka-clients rely on:

org.apache.kafka kafka-clients 2.2.0

Sample

Creating a simple code samples Kafka producers as follows:

    Properties props = new Properties();
    props.put("bootstrap.servers", "producer1:9092");
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    /*创建生产者*/
    Producer<String, String> producer = new KafkaProducer<>(props);

    for (int i = 0; i < 10; i++) {
        ProducerRecord<String, String> record = new ProducerRecord<>(topicName, "k" + i, "v" + i);
        /* 发送消息*/
        producer.send(record);
    }
    /*关闭生产者*/
    producer.close();

The sample must be equipped with only these three attributes, others are using the default configuration.

Send Message Kafka

Objects instantiated producer, the next can begin sending the message. There are three main ways to send a message:

And-forget (fire-and-forget): sending a message to the server, but does not care whether the message reaches the normal, that is way above the sample. In most cases, the message will reach normal, which can be ensured by the high availability and automatic retransmission mechanism of Kafka. But sometimes lost messages.

Synchronous transmission: use send () method sends a message, it will return a Future object, call the get () method to wait, we can know whether the message is sent successfully.

Asynchronous transmission: call send () method, specifying a callback function, when the server calls the function returns a response.

Send and forget

This is the easiest way to send a message, sent only to send no matter the result, the code sample below:

ProducerRecord<String, String> record = new ProducerRecord<>(“Topic”, “k”, “v”); // 1
try {
producer.send(record); // 2
} catch (Exception e) {
e.printStackTrace(); // 3
}

The code points to note:

Producers send () method ProducerRecord object as a parameter, the sample was used ProducerRecord constructor requires the name of the theme and target key and value objects to be sent, they are strings. Value of the object key and type must match the sequence of objects and producers.

Use producers send () method sends ProducerRecord object. Message will first be put into a buffer, then sent to the server using a separate thread. send () method returns a Future object that contains RecordMetadata, but here not concerned about what is returned.

When sending a message, producers may be some abnormality, abnormal failure sequence of messages, beyond the buffer zone anomaly, abnormal overtime, or send thread is interrupted exception.

Synchronous transmission

A transmission in the embodiment has been explained only the difference between transmission and synchronous transmission, the following sample code is the easiest way of synchronous transmission, contrast differences can be seen:

ProducerRecord<String, String> record = new ProducerRecord<>(“Topic”, “k”, “v”);
try {
producer.send(record).get;
} catch (Exception e) {
e.printStackTrace();
}

Can be seen, the difference between the two is whether the received transmission result. Synchronous transmission receives send () method returns a value, i.e. a Future object, to wait for a response Kafka Future object by calling the get () method. If the server returns an error, then get () method will throw an exception. If no errors occur, we will get a RecordMetadata object, you can use it to get the offset of the message.

Asynchronous send a message

For high throughput requirements of applications, but also at the same time to ensure service reliability, less reliable way to send and forget, but will reduce the throughput of synchronous transmission mode, which requires an asynchronous way to send a message. Most of the time, the producers do not need to wait for a response, only need to be analyzed in the face of a message transmission fails, an exception is thrown, log errors, or to write a message "Error log" file to facilitate later. Sample code is as follows:

ProducerRecord <String, String> Record new new ProducerRecord = <> ( "Topic", "K", "V");
// send asynchronous messages, and listening callback
producer.send (Record, the Callback new new () {//. 1
@ override
public void onCompletion (RecordMetadata Metadata, exception exception) {// 2
IF (exception! = null) {
// exception handling
} the else {
System.out.printf ( "Topic =% S,% = Partition D, offset = S% \ n-", metadata.topic (), metadata.partition (), metadata.offset ());
}
}
});

You can see from the above code, in order to use a callback, you only need to implement a org.apache.kafka.clients.producer.Callback interfaces, this interface has only one method onComplete.

If Kafka returns an error, onComplete method throws a non-null exception. When calling send () method of the object will pass this callback, decided to call exception handling approach results or send the results sent.

Producers Configuration

In the creation of producer, describes the three essential attributes in this section and then introduced one by one under the other producers attributes:

acks

acks parameter specifies how many copies of the partition must have received the message, producers will be considered a success message is written:

acks = 0: message sent to believe has been successful, and will not wait for any response from the server;

acks = 1: as long as the cluster leader node receives the message, the producer will receive a successful response from the server;

acks = all: only when all nodes participating in the replication of all messages received, the producer will receive a successful response from the server.

buffer.memory

This parameter is used to set the memory buffer size message producer with the producer to send it to the server buffer. If the program fast to send messages over the speed sent to the server, it will lead to insufficient buffer space producer, this time calling the send () methods are either blocked or thrown.

compression.type

By default, the compressed message will not be sent. It specifies which compression algorithm before being sent to the message broker is compressed, optionally with a value Snappy (CPU occupy less, when concerned selection performance and network bandwidth), the gzip (CPU-multiple, higher compression ratio, bandwidth the choice is limited), lz4.

retries

Specifies the producer put the error message number, the message retransmission. If you reach the set value, the producers will give up retry and returns an error.

batch.size

When there are a plurality of messages needs to be sent to the same partition, put them in the same producer in batch. This parameter specifies a batch size of memory that can be used, calculated according to the number of bytes.

linger.ms

The parameters established producers waiting for more information before sending batches Joined batch. KafkaProducer fills or linger.ms upper limit is reached the batch sent out in batches.

client.id

Client id, used to identify the origin of the message server.

max.in.flight.requests.per.connection

It specifies the producer how many messages can be sent before receiving the server response. The higher its value, will take up more memory, but will also enhance the throughput, it is set to ensure that a message can be sent in the order written to the server, even if the retry occurs.

timeout.ms、request.timeout.ms和metadata.fetch.timeout.ms

timeout.ms specified acknowledgment wait time borker synchronized copy of the return message;

request.timeout.ms producer specified time when data is transmitted wait for the server to return a response;

metadata.fetch.timeout.ms specify the time when the producers wait for the server to obtain metadata (such as who is the leader of partition) to return a response.

max.block.ms

This parameter specifies the blocking time acquiring metadata () method or using partitionsFor () method call to send the producer. Transmit buffer is full when the producer, or if no metadata are available, these methods will be blocked. When the blocking time to reach max.block.ms, producers will throw timeout exception.

max.request.size

This parameter controls the size of the request sent by the producer. It may refer to a maximum for a single message transmission may also refer to the total size of all the messages in a single request. For example, assuming that the value of 1000K, the largest single message may then be transmitted to 1000K, or producers can send a single request in a batch, the batch comprising 1,000 messages, each size of 1K.

receive.buffer.bytes和send.buffer.byte

These parameters specify the size of the receiving and sending TCP socket packet buffer, -1 default values ​​for the system.

Partition

The partitioner

When the above described embodiment of the message sent by the producer has the following line:

ProducerRecord<String, String> record = new ProducerRecord<>(“Topic”, “k”, “v”);

This specifies the destination topic Kafka messages, keys and values. ProducerRecord object contains themes, keys and values. The key role is:

As additional information of the message;

Used to determine which messages are written to the partition theme, with the same key messages will be written to the same partition.

Keys can be set as the default null, is not null difference:

If the key is null, the partition algorithm uses a polling message to a balanced distribution of the respective partitions;

If the key is not null, then the partition key will hash to use the built-in hashing algorithm, and then distributed to each partition.

Note that, not only in the case of changing the zoning maps to the number of partitions between the theme, key and partition remains unchanged.

Order to ensure

Kafka can guarantee the same partition where the news is ordered. Consider a case, if the retries is non-zero integer, while max.in.flight.requests.per.connection larger than a number of messages are required if some scenes ordered, i.e., the producer server response is received a plurality of messages may be sent before, and the failure will be retried. So if the first batch write failure message, and the second successful, retry Broker written to the first batch, if at this time the first batch of a successful write, then the order to two batches reversed. That is, to ensure that the message is ordered, if the message is also critical to the success of writing. So how do you do it? In the case where the order of the message to be strictly required, retries may be set to be greater than 0, max.in.flight.requests.per.connection set to 1, so that when the producer of the first attempt to send a message, it will not there are other messages sent to the Broker. Of course, this time seriously affect throughput producers.

Published 78 original articles · won praise 9 · views 6187

Guess you like

Origin blog.csdn.net/WANXT1024/article/details/104414780