Kafka learning---2, kafka producer, asynchronous and synchronous sending API, partition, production experience

1. Kafka producer

1.1 Producer message sending process

1.1.1 Sending principle

In the process of message generation, two threads are designed - main thread and Sender thread. A double-ended queue RecordAccumulator is created in the main thread. The main thread sends messages to RecordAccumulator, and the Sender thread continuously pulls messages from RecordAccumulator and sends them to Kafka Broker.
insert image description here

  • batch.size: The sender will send data only after the data has accumulated to batch.size. default 16k
  • linger.ms: If the data has not reached the batch.size, the sender will send the data after waiting for the time set by linger.ms to expire. The unit is ms, and the default value is 0ms, which means there is no delay.

Reply acks:

  • 0: The data generated by the producer does not need to wait for the data to be placed on the disk for a response.
  • 1: The data generated by the producer, the Leader responds after receiving the data
  • -1 (all): For the data sent by the producer, the Leader and all nodes in the ISR queue will respond after receiving the data. -1 is equivalent to all.
1.1.2 List of important parameters of producers
parameter name describe
bootstrap.servers A list of Broker addresses required by producers to connect to the cluster. For example hadoop102:9092, hadoop103:9092, hadoop104:9092, you can set one or more, separated by commas. Note that not all broker addresses are required here, because the producer can find other broker information from a given broker
key.serializer和value.serializer Specifies the serialization type of the key and value of the generated information. Be sure to write the full class name
buffer.memory The total size of the RecordAccumulator buffer, the default is 32MB
batch.size The maximum value of a batch of data in the buffer, the default is 16K. Properly increasing the value can improve the throughput, but if the value is set too large, it will lead to increased data transmission delay
linger.ms If the data has not reached the batch.size, the sender will send the data after waiting for linger.time. The unit is ms, the default value is 0ms, which means no delay. The production environment recommends that the value be between 5-100ms
acks 0: The data generated by the producer does not need to wait for the data to be placed on the disk for a response. 1: For the data sent by the producer, the Leader responds after receiving the data. -1 (all): For the data sent by the producer, all nodes in the leader and isr queues will respond after receiving the data. The default value is -1, -1 and all are equivalent
max.in.flight.requests.per.connection The maximum number of times that no ack is returned is allowed, the default is 5, and the idempotence package is enabled to ensure that the value is a number from 1 to 5
retries When an error occurs in sending a message, the system will resend the message. retries indicates the number of retries. The default is the maximum value of int, 2147483647. If retry is set and you still want to maintain the order of the messages, you need to set MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION=1, otherwise when retrying this failed message, other messages may be sent successfully
retry.backoff.ms The time interval between two retries, the default is 100ms.
enable.idempotence Whether to enable idempotence, the default is true, and idempotence is enabled.
compression.type How to compress all data sent by the producer. The default is none, which means no compression. Supported compression types: none, gzip, snappy, lz4, and zstd.

1.2 Asynchronous send API

1.2.1 Ordinary asynchronous sending

1. Requirements: Create a Kafka producer and send it to Kafka Broker asynchronously
2. Code writing
(1) Create a project (KafkaDemo)
(2) Import dependencies

<dependencies>
 <dependency>
 <groupId>org.apache.kafka</groupId>
 <artifactId>kafka-clients</artifactId>
 <version>3.0.0</version>
 </dependency>
</dependencies>

(3) Create package name org.zhm.producer
(4) Write API code without callback function

package org.zhm.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;

import java.util.Properties;

/**
 * @ClassName CustomProducer
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 18:35
 * @Version 1.0
 */
public class CustomProducer {
    
    
    public static void main(String[] args) {
    
    
        //1、创建kafka生产者的配置对象
        Properties properties=new Properties();

        //2、给kafka配置对象添加配置信息:bootstrap.servers
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        //key,value序列化(必须):key.serializer,value.serializer
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");

        //3、创建kafka生产者对象
        KafkaProducer<String,String> kafkaProducer=new KafkaProducer<String, String>(properties);

        //4、调用send()方法,发生消息
        for (int i = 0; i < 5; i++) {
    
    
            kafkaProducer.send(new ProducerRecord<>("first","zhm"+i));
        }

        //5、关闭资源
        kafkaProducer.close();
    }
}


(5) Test
① Start the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

insert image description here
② Execute the code in IDEA and observe whether the message is received in the hadoop102 console.
insert image description here

1.2.2 Asynchronous sending with callback function

The callback function will be called when the Producer receives the ack. It is an asynchronous call. This method has two parameters, which are metadata information (RecordMetadata) and exception information (Exception). If Exception is null, it means that the message occurred successfully. If Exception If it is not null, it means that the sending of the message failed.

Note: If the message fails to be sent, it will be automatically retried, and we do not need to manually retry in the callback function.

package org.zhm.producer;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * @ClassName CustoProducerCallback
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 18:44
 * @Version 1.0
 */
public class CustoProducerCallback {
    
    
    public static void main(String[] args) throws InterruptedException {
    
    
        //1、创建kafka生产者的配置对象
        Properties properties=new Properties();

        //2、给kafka配置对象添加配置信息
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        //key、value序列化(必须)
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());


        //3、创建kafka生产者对象
        KafkaProducer<String,String> producer=new KafkaProducer<>(properties);

        //4、调用send()方法 发送信息
        for (int i = 0; i < 6; i++) {
    
    
            //添加回调
            producer.send(new ProducerRecord<>("first", "zhm" + i), new Callback() {
    
    
                //该方法在Producer收到ack时调用,为异步调用
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        //没有异常,输出信息到控制台
                        System.out.println("主题:"+recordMetadata.topic()+"->"+"分区:"
                                +recordMetadata.partition());

                    }
                    else {
    
    
                        //出现异常打印
                        e.printStackTrace();
                    }
                }
            });

            //延迟一会会看到数据发往不同分区
            Thread.sleep(20);

        }

        //5、关闭资源
        producer.close();
    }
}


1. Test
① Enable the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

② Execute the code in IDEA and observe whether the message is received in the hadoop102 console.
insert image description here

1.3 Synchronous send API

Just call the get() method on the basis of asynchronous sending.

package org.zhm.producer;

import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;
import java.util.concurrent.ExecutionException;

/**
 * @ClassName CustomProducerSync
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 18:58
 * @Version 1.0
 */
public class CustomProducerSync {
    
    
    public static void main(String[] args) throws ExecutionException, InterruptedException {
    
    
        //1、创建kafka生产者的配置对象
        Properties properties=new Properties();

        //2、给kafka配置对象添加配置信息
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        //key、value序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());

        //3、创建kafka生产者对象
        KafkaProducer<String,String> producer=new KafkaProducer<>(properties);

        //4、调用send方法,发送信息
        for (int i = 0; i < 10; i++) {
    
    
            //异步发送 默认
//            producer.send(new ProducerRecord<>("first","zhm"+i));
            //同步发送
            producer.send(new ProducerRecord<>("first","zhmzhm"+i)).get();

        }

        //5、关闭资源
        producer.close();
    }
}


1. Test
① Enable the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

② Execute the code in IDEA and observe whether the message is received in the hadoop102 console.

insert image description here

1.4 Producer Partition

1.4.1 Benefits of Partitioning

1. It is convenient for reasonable use of storage resources. Each Partition is stored on a Broker, and massive data can be divided into pieces according to partitions and stored on multiple Brokers. Reasonable control of partition tasks can achieve the effect of load balancing.
2. To improve the degree of parallelism, producers can send data in units of partitions; consumers can consume data in units of partitions.
insert image description here

1.4.2 The partition where the producer produces the message

1. Default Partitioner DefaultPartitioner
(1) When partition is specified, the specified value is directly used as the partition value; for example, partition=0, all data is written to partition 0.
(2) If the partition value is not specified but there is a key, take the remainder of the hash value of the key and the partition number of the topic to obtain the partition value; for example: the hash value of key1=5, the hash value of key2=6, and the partition of topic number = 2, then value1 corresponding to key1 is written into partition 1, and value2 corresponding to key2 is written into partition 0.
(3) When there is neither partition value nor key value, Kafka uses Sticky Partition (sticky partitioner), which will randomly select a partition and use the partition as much as possible until the batch of the partition is full or
completed , Kafka uses another random partition (different from the previous partition).
For example: partition 0 is randomly selected for the first time, and when the current batch of partition 0 is full (default 16k) or the time set by linger.ms is up, Kafka will randomly use another partition (if it is still 0, it will continue to be random)
.
2. Case 1:
The case of sending data to a specified partition

package org.zhm.producer;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * @ClassName CustomProducerCallbackPartitions
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 19:10
 * @Version 1.0
 */
public class CustomProducerCallbackPartitions {
    
    
    public static void main(String[] args) {
    
    
        //1、创建kafka生产者的配置对象
        Properties properties=new Properties();

        //2、给kafka配置对象添加配置信息
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        //键值序列化
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());

        //3、创建生产者对象
        KafkaProducer<String ,String> producer=new KafkaProducer<String, String>(properties);

        //4、调用send方法,发送信息
        for (int i = 0; i < 5; i++) {
    
    
            //指定数据发送到1号分区,key1为空
            producer.send(new ProducerRecord<>("first", 1, "", "zhm" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("主题:"+recordMetadata.topic()+"->"+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });
        }

        //5、关闭资源
        producer.close();
    }
}


(1) Test
① Enable the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

② Execute the code in IDEA and observe whether the message is received in the hadoop102 console.
insert image description here
3. Case 2
If no partition value is specified but there is a key, the partition value is obtained by taking the remainder of the hash value of the key and the partition number of the topic.

package org.zhm.producer;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * @ClassName CustomProducerCallback1
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 19:21
 * @Version 1.0
 */
public class CustomProducerCallback1 {
    
    
    public static void main(String[] args) {
    
    
        Properties properties=new Properties();
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());

        KafkaProducer<String,String> kafkaProducer=new KafkaProducer(properties);

        for (int i = 0; i < 5; i++) {
    
    
            //依次指定key值为a、b、f,数据key的hash值与3分别发往1、2、0
            kafkaProducer.send(new ProducerRecord<>("first", "a", "zhm" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("当key为a时:"+"主题:"+recordMetadata.topic()+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });

        }
        for (int i = 0; i < 5; i++) {
    
    
            //依次指定key值为a、b、f,数据key的hash值与3分别发往1、2、0
            kafkaProducer.send(new ProducerRecord<>("first", "b", "zhm" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("当key为b时:"+"主题:"+recordMetadata.topic()+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });

        }
        for (int i = 0; i < 5; i++) {
    
    
            //依次指定key值为a、b、f,数据key的hash值与3分别发往1、2、0
            kafkaProducer.send(new ProducerRecord<>("first", "f", "zhm" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("当key为f时:"+"主题:"+recordMetadata.topic()+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });

        }
        kafkaProducer.close();
    }
}


(1) Test
① Enable the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

② Execute the code in IDEA and observe whether the message is received in the hadoop102 console.
insert image description here

1.4.3 Custom Partitioner

If the R&D personnel can re-implement partitioner
1 according to the needs of the enterprise, for example, if we implement a partitioner, if the sent data contains atguigu, it will be sent to partition 0, and if atguigu is not included, it will be sent to partition 1.
2. Case implementation
(1) Define the class to implement the Partitioner interface.
(2) Rewrite the partition() method.

package org.zhm.producer;

import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;

import java.util.Map;

/**
 * @ClassName Mypartitioner
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 19:28
 * @Version 1.0
 */

/**
 1、实现接口Partitioner
 2、实现三个方法:Partition、close、configure
 3、编写Partition方法,返回分区号
 */
public class MyPartitioner implements Partitioner {
    
    

    /*
    *
     * @description:返回信息对应的分区
     * @author: zouhuiming
     * @date: 2023/6/12 19:30
     * @param: [s, o, bytes, o1, bytes1, cluster]
     * [主题、消息的key、消息的key序列化后的字节数组、消息的value、消息的value序列哈后字节数组、集群元数据可以查看的分区信息]
     * @return: int
     **/
    @Override
    public int partition(String s, Object o, byte[] bytes, Object o1, byte[] bytes1, Cluster cluster) {
    
    
        //获取信息
        String msyValue = o1.toString();

        //创建partition
        int partition;

        //判断信息是否包含zhm
        if (msyValue.contains("zhm")){
    
    
            partition=0;
        }
        else {
    
    
            partition=1;
        }
        //返回分区号
        return partition;
    }

    @Override
    public void close() {
    
    

    }

    @Override
    public void configure(Map<String, ?> map) {
    
    

    }
}


(3) Use the partitioner method to add partitioner parameters in the producer configuration.

package org.zhm.producer;

import org.apache.kafka.clients.producer.*;
import org.apache.kafka.common.serialization.StringSerializer;

import java.util.Properties;

/**
 * @ClassName CustomProducerCallbackPartitionsMine
 * @Description TODO
 * @Author Zouhuiming
 * @Date 2023/6/12 19:35
 * @Version 1.0
 */
public class CustomProducerCallbackPartitionsMine {
    
    
    public static void main(String[] args) {
    
    
        Properties properties=new Properties();

        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"hadoop102:9092");

        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());

        //添加自定义分区器
        properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG,"org.zhm.producer.MyPartitioner");

        KafkaProducer<String,String> kafkaProducer=new KafkaProducer<String, String>(properties);

        for (int i = 0; i < 5; i++) {
    
    
            kafkaProducer.send(new ProducerRecord<>("first", "zhm" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("主题:"+recordMetadata.topic()+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });

        }
        for (int i = 0; i < 5; i++) {
    
    
            kafkaProducer.send(new ProducerRecord<>("first", "hello" + i), new Callback() {
    
    
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
    
    
                    if (e==null){
    
    
                        System.out.println("主题:"+recordMetadata.topic()+"分区:"+recordMetadata.partition());
                    }else {
    
    
                        e.printStackTrace();
                    }
                }
            });

        }

        kafkaProducer.close();
    }
}


(4) Test
① Start the Kafka consumer on hadoop102.

bin/kafka-console-consumer.sh --bootstrap-server hadoop102:9092 --topic first

② Observe the callback information on the IDEA console.
insert image description here

1.5 Production Experience - How Producers Can Improve Throughput

  • batch.size: batch size, default 16k
  • linger.ms: waiting time, modified to 5-100ms
  • compression.type: compression snappy
  • RecordAccumulator: buffer size, modify 1 to 64MB

1.6 Production Experience - Data Reliability

1. Principle of ack response
insert image description here
insert image description here
Reliability summary:

  • acks=0, the data sent by the producer is ignored, the reliability is poor, and the efficiency is high;
  • acks=1, the data leader response sent by the producer, the reliability is medium, and the efficiency is medium;
  • acks=-1 (all), the producer sends data Leader and all follower responses in the ISR queue, high reliability, low efficiency;
    in the production environment, acks=0 is rarely used; acks=1, generally used for transmission Ordinary logs allow individual data to be lost; acks=-1 is generally used to transmit data related to money and requires high reliability.

Data Repeat Analysis
insert image description here

1.7 Production experience - data deduplication

1.7.1 Data transfer semantics
  • At least once (At Least Once) = ACK level is set to -1 + the number of copies of the partition is greater than or equal to 2 + the minimum number of copies of the response in the ISR is greater than or equal to 2
  • At Most Once = ACK level is set to 0
  • Summarize
    • At Least Once can guarantee that data will not be lost, but it cannot guarantee that data will not be repeated;
    • At Most Once can guarantee that the data will not be repeated, but it cannot guarantee that the data will not be lost.
  • Exactly Once: For some very important information, such as data related to money, the data is required to be neither repeated nor lost. After Kafka version 0.11, a major feature was introduced: idempotence and transactions.
1.7.2 Idempotency

Idempotency means that no matter how many times the Producer sends repeated data to the Broker, the Broker will only persist one piece of data to ensure no repetition.
Exactly once = idempotency + at least once (ack=-1 + number of partition copies >=2 + minimum number of copies of ISR >=2).

Criteria for judging duplicate data: When messages with the same primary key as <PID, Partition, SeqNumber> are submitted, Broker will only persist one message. Among them, PID is assigned a new one every time Kafka restarts; Partition represents the partition number; Sequence Number is monotonically increasing.
Therefore, idempotence can only guarantee that there is no duplication within a single partition and single session.
insert image description here
How to enable idempotence
Turn on the parameter enable.idempotence The default is true, false is off

1.7.3 Producer Transactions

1. Kafka transaction principle
Note: To start a transaction, idempotence must be enabled
insert image description here
2. Kafka’s transaction has the following 5 APIs

// 1 初始化事务
void initTransactions();
// 2 开启事务
void beginTransaction() throws ProducerFencedException;
// 3 在事务内提交已经消费的偏移量(主要用于消费者)
void sendOffsetsToTransaction(Map<TopicPartition, OffsetAndMetadata> offsets,
 String consumerGroupId) throws 
ProducerFencedException;
// 4 提交事务
void commitTransaction() throws ProducerFencedException;
// 5 放弃事务(类似于回滚事务的操作)
void abortTransaction() throws ProducerFencedException;

1.8 Production experience - data orderly

insert image description here

1.8 Production inspection - data disorder

1. Before version 1.x, kafka ensured that the data partitions are in order, and the conditions are as follows:
max.in.flight.requests.per.connection=1 (it does not need to consider whether to enable idempotency).
2. In Kafka 1.x and later versions, the single partition of data is guaranteed to be in order. The conditions are as follows: (1) Max.in.flight.requests.per.connection needs to be set to 1
if idempotence is not enabled . (2) To enable idempotence max.in.flight.requests.per.connection needs to be set to be less than or equal to 5. Explanation of the reason: After Kafka1.x, after idempotence is enabled, the Kafka server will cache the metadata of the last 5 requests sent by the producer, so no matter what, the data of the last 5 requests can be guaranteed to be in order .




insert image description here

Guess you like

Origin blog.csdn.net/qq_44804713/article/details/131181542