In-depth understanding of Kafka series (2)-Kafka producer

Series Article Directory

Kakfa authoritative guide series articles

Preface

This series is my transcript and thoughts after reading the book "The Definitive Guide to Kafka".

text

Kafka producer

The main steps of Kafka sending messages

First put the picture: the main steps of sending a message to Kafka are
Insert picture description here
described in words:

  1. Create a ProducerRecord object that contains attributes such as the subject and the content sent.
  2. The specified key and partition are used to send ProducerRecord objects to the specified partition. And when sending objects, the producer needs to serialize the keys and values ​​into byte arrays (we need to set the serializer) so that they can be transmitted on the network.
  3. The data is sent to the partitioner, which determines which topic and partition the message is sent to. Then, this record is added to a record batch (RecordAccumulator, message accumulator).
  4. After the message reaches a certain level, a separate thread (Sender) thread will send all the messages of the same batch to the corresponding broker.
  5. After the server receives the message, it returns a response

1. If successful:
Return a RecordMetaData object, including subject, partition, and offset.
2. If it fails:
returns an error. After receiving the error, the producer will try to resend the message. If it fails after a certain number of times, it will return an error message.


Create Kafka producer (API)

Remember to open Kafka in advance. The installation steps are in the previous blog:
Deep Understanding of Kafka Series (1)-First Know Kafka

The pom package depends on:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>0.11.0.0</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.12</artifactId>
    <version>0.11.0.0</version>
</dependency>

Simple demo:

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

public class Test {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();
        properties.put("bootstrap.servers", "192.168.237.130:9092");
        properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
        for (int i = 0; i < 3; i++) {
    
    
            producer.send(new ProducerRecord<String, String>("test", Integer.toString(i), "message" + i));
        }
        producer.close();
    }
}

After execution, you will find on the terminal:
Insert picture description here
Verification:
Copy a session, enter the command:

./bin/kafka-console-consumer.sh --zookeeper 192.168.237.130:2181 --from-beginning --topic test

It can be seen that the message was successfully written.
Insert picture description here


The following is a detailed introduction to the producer API:

Detailed explanation of Kafka producer parameters

The above demo is the most basic Kafka producer. As you can observe, I only set 3 attributes, and these 3 attributes must be set.

  1. bootstrap.servers

This attribute specifies the address list of the broker, and the format of the address is host:port.
If Kafka is a cluster, it is not necessary to write all the brokers, because the producer will find the information of other brokers from a given broker, but note: it is recommended to provide at least 2 broker nodes to prevent other problems caused by downtime.

  1. key.serializer

The key-value pairs of the message that the broker hopes to receive are all byte arrays, and key.serializer
must be set to a class that implements the org.apache.kafka.common.serialization.StringSerializer interface, and the producer will use this class The key object is serialized into a byte array for network transmission.

  1. value.serializer

Like key.serializer, it must be set.

  1. acks

The acks parameter specifies how many partition replicas must receive the message before the producer considers the message to be written successfully. (This parameter is very important and has a great impact on the possibility of message loss)
(1) acks=0:
1. The producer will not wait for any response from the server before successfully writing the message.
2. That is: if there is a problem sending the message, the producer will not know, and the message will be lost.
3. Also because of the second reason, the producer does not need to wait for the server's response, so this mode can support the maximum speed to send messages, high throughput.
(2) acks=1
1. As long as the master node of the cluster receives a message, the producer will receive a response.
2. If the message is not sent to the master node or the master node is down, a slave node that has not received the message becomes the master node. Loss of messages will happen.
(3) acks=all
1. The producer will receive a successful response from the server when all the nodes participating in the replication have all received the message. This mode is the safest, but it has the lowest throughput and the highest latency.

  1. buffer.memory

1. Used to set the size of the producer's memory buffer.
2. Because Kafka's message is first put into the producer's buffer, if the buffer is full, then an independent Sender thread is started to send the buffer's message to the server.

  1. compression.type

1. By default, messages will not be compressed when they are sent. This parameter is to set the compression format of Kafka messages.
2. Compression format support: snappy, gzip, lz4.

  1. retries

The value of this parameter determines the number of times the producer can repeat the message. If this number is reached, the producer will give up retrying and return an error.

  1. batch.size

1. When multiple messages need to be sent to a partition, the producer will put them in the same batch.
2. And this parameter determines the memory size that can be used by a batch, calculated according to the number of bytes.

  1. linger.ms

1. This parameter specifies the time the producer waits for more messages to be added to the batch before sending the batch.
2. kafkaProducer will send the batch when the batch is full or the parameter reaches the upper limit.

  1. client.id

You can try any string for this parameter, and the server will use it to identify the source of the message.

  1. max.in.flight.requests.per.connection

1. This parameter determines how many messages the producer can send before receiving the server response.
2. The higher the value, the higher the memory occupied, but the higher the throughput.

  1. max.block.ms

1. This parameter specifies the blocking time of the producer when the metadata is obtained when the send() method is called.
2. When did the blockage occur? When the producer's send buffer is full, or there is no metadata available, the send method will block.
3. If the blocking time is exceeded, an exception is thrown.

  1. max.request.size

This parameter is used to control the request size of the message sent by the producer

  1. receive.buffer.bytes和send.buffer.bytes

These two parameters respectively specify the buffer size of TCP socket receiving and sending data packets.


Detailed explanation of Kafka producer sending method

There are three main ways to send messages in Kafka

  1. Send and forget (fire-and-forget)
  2. Send synchronously
  3. Asynchronous send

1. The most basic sending method:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
// 构造格式: topic,key,value
ProducerRecord<String, String> record = new ProducerRecord<>("test", Integer.toString(4), "message" + 4);
try {
    
    
    producer.send(record);
} catch (Exception e) {
    
    
    e.printStackTrace();
} finally {
    
    
    producer.close();
}

2. Send messages synchronously:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test", Integer.toString(4), "message" + 4);
try {
    
    
	// 我们调用send方法返回的是一个Future对象。然后调用get方法等待kafka响应
	// 在调用Future的get()方法,若写入成功,则返回一个正确的相应RecordMetadata。
    RecordMetadata recordMetadata = producer.send(record).get();
    System.out.println(recordMetadata.offset());
} catch (Exception e) {
    
    
    e.printStackTrace();
} finally {
    
    
    producer.close();
}

result:
Insert picture description here

3. Send messages asynchronously:
customize a callback class (note the package of the implementation class):

import org.apache.kafka.clients.producer.Callback;
import org.apache.kafka.clients.producer.RecordMetadata;

public class DemoProducerCallBack implements Callback {
    
    
    @Override
    public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
    	// 意思是不报错的话,我们会返回消息。
        if (e == null) {
    
    
            System.out.println("发送成功!");
        }
    }
}
KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
ProducerRecord<String, String> record = new ProducerRecord<>("test", Integer.toString(5), "message" + 5);
try {
    
    
	// 直接将我们自定义的回调类放入send方法中即可
    producer.send(record, new DemoProducerCallBack());
} catch (Exception e) {
    
    
    e.printStackTrace();
} finally {
    
    
    producer.close();
}

result:
Insert picture description here


Serializer

First of all, we should understand one thing, what is the serializer used for?
To put it bluntly, it is for the message generated by the producer to be transmitted smoothly on the network.
If we say that we want to send an object as a message, undoubtedly, we must customize a serializer, otherwise an error will be reported.

Custom serializer Demo

Suppose we want to transmit the Customer object as a message.
Customer class:

public class Customer {
    
    
    private int customerId;
    private String customerName;

    public Customer(int customerId, String customerName) {
    
    
        this.customerId = customerId;
        this.customerName = customerName;
    }

    public int getCustomerId() {
    
    
        return customerId;
    }

    public void setCustomerId(int customerId) {
    
    
        this.customerId = customerId;
    }

    public String getCustomerName() {
    
    
        return customerName;
    }

    public void setCustomerName(String customerName) {
    
    
        this.customerName = customerName;
    }
}

Custom serializer (be careful not to mislead the package):
CustomerSerializer

import org.apache.kafka.common.errors.SerializationException;
import org.apache.kafka.common.serialization.Serializer;

import java.nio.ByteBuffer;
import java.util.Map;

public class CustomerSerializer implements Serializer<Customer> {
    
    
    @Override
    public void configure(Map<String, ?> map, boolean b) {
    
    
        // 不做任何事
    }

    @Override
    public byte[] serialize(String topic, Customer data) {
    
    
        try {
    
    
            byte[] serializedName;
            int stringSize;
            if (data == null) {
    
    
                return null;
            } else {
    
    
                if (data.getCustomerName() != null) {
    
    
                    serializedName = data.getCustomerName().getBytes("UTF-8");
                    stringSize = serializedName.length;
                } else {
    
    
                    serializedName = new byte[0];
                    stringSize = 0;
                }
            }
            ByteBuffer buffer = ByteBuffer.allocate(4 + 4 + stringSize);
            buffer.putInt(data.getCustomerId());
            buffer.putInt(stringSize);
            buffer.put(serializedName);
            return buffer.array();
        } catch (Exception e) {
    
    
            throw new SerializationException("Error!!!!");
        }
    }

    @Override
    public void close() {
    
    

    }
}

Test class: (note that the serializer has changed)

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.Properties;

public class Test2 {
    
    
    public static void main(String[] args) {
    
    
        Properties properties = new Properties();
        // 这里的value,填你的自定义序列化器的引用地址。
        properties.put("bootstrap.servers", "192.168.237.130:9092");
        properties.put("key.serializer", "kafka.CustomerSerializer");
        properties.put("value.serializer", "kafka.CustomerSerializer");
        KafkaProducer<String, Customer> producer = new KafkaProducer<>(properties);
        Customer customer = new Customer(1, "hello");
        ProducerRecord record = new ProducerRecord("test", customer);
        try {
    
    
            producer.send(record);
        } catch (Exception e) {
    
    
            e.printStackTrace();
        } finally {
    
    
            producer.close();
        }
    }
}

Result:
Insert picture description here
For comparison, create a Customer class named Customer2 (just copy).
If the code is changed to:
Insert picture description here
See what the result is?
Insert picture description here
why? Because our custom serializer says here: One-
Insert picture description here
to-one correspondence!

Disadvantages of using a custom serializer:

  1. The above demo is a simple entry, then the question is, if there are multiple types of objects in a real project, if there are 100 objects to be transmitted as messages, then do we have to create 100 custom ones? What about the serializer?
  2. Therefore, it is not recommended to use a custom serializer, it is better to use some serializer framework, such as: JSON, Avro, Protobuf, etc.

to sum up

This article outlines from several aspects:
1. The general process of Kafka producer.
2. The relevant parameters of the Kafka producer API and the way to send messages.
3. Kafka-serializer.
The next article will give a detailed introduction based on Kafka's consumer and API levels.

Guess you like

Origin blog.csdn.net/Zong_0915/article/details/109357082