Kafka and zookeeper clusters ensure data reliability, order, and configuration load balancing

Kafka reply ack mode

• 0: The data sent by the producer does not need to wait for the data to be placed on the disk to respond.

• 1: The data sent by the producer, the leader will reply after receiving the data.

Possible situation: After the response, the leader hangs up and re-election is made. At this time, the data of the previous response will not be sent.

• -1 (all): The data sent by the producer, all nodes in the Leader+ and isr queues will respond after collecting the data. -1 is equivalent to all.

Possible situations: When a follower hangs up, it responds late, so a dynamic isr is maintained internally. When sending a request or synchronizing data, the isr will be raised if there is no response within 30s by default.

Guarantee data reliability

If the partition replica is set to 1, or the minimum number of replicas in the ISR (min.insync.replicas defaults to 1) is set to 1, the effect is the same as ack=1, but there is still a risk of losing the number (leader: 0 , isr:0).

• Data is completely reliable condition = ACK level is set to -1 + partition replica is greater than or equal to 2 + the minimum number of replicas acknowledged in ISR is greater than or equal to 2

Code

public class CustomProducer {
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        //属性配置
        Properties properties = new Properties();
        //连接集群
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.6.101:9092");
        //指定k、v序列化类型
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
        // 设置 acks
        properties.put(ProducerConfig.ACKS_CONFIG, "1");
        // 重试次数 retries,默认是 int 最大值,2147483647
        properties.put(ProducerConfig.RETRIES_CONFIG, 3);
        //创建生产者对象
        KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
        //像first主题发送数据
        kafkaProducer.send(new ProducerRecord<>("first", 1,"","lzq"),new Callback() {
            @Override
            public void onCompletion(RecordMetadata recordMetadata, Exception e) {
                if (e==null){
                    System.out.println("发送成功,主题"+recordMetadata.topic()+"分区"+recordMetadata.partition());
                }
            }
        }).get();
        //关闭资源
        kafkaProducer.close();
    }
}

Data duplication problem

When sending data, the leader receives it, like a follower synchronously, and hangs up when responding 

idempotency

Idempotency means that no matter how many times the Producer sends repeated data to the Broker, the Broker side will only persist one piece of data, ensuring non-duplication. Exactly Once = idempotent + at least once (ack=-1 + number of partition replicas>=2 + minimum number of replicas in ISR>=2)

Judgment criteria for duplicate data: When a message with the same primary key is submitted, the Broker will only persist one message. The PID is a new one that Kafka will assign every time it restarts; Partition represents the partition number; Sequence Number is monotonically increasing. So idempotency can only guarantee that there is no repetition within a single partition and single session.

The open parameter enable.idempotence defaults to true, false to close

working process

 Code

public class CustomProducer {
    public static void main(String[] args) throws ExecutionException, InterruptedException {
        //属性配置
        Properties properties = new Properties();
        //连接集群
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"192.168.6.101:9092");
        //指定k、v序列化类型
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,StringSerializer.class.getName());
        //关联自定义分区
        properties.put(ProducerConfig.PARTITIONER_CLASS_CONFIG,"com.lzq.producer.MyPartitioner");
        // 设置 acks
        properties.put(ProducerConfig.ACKS_CONFIG, "all");
        // 重试次数 retries,默认是 int 最大值,2147483647
        properties.put(ProducerConfig.RETRIES_CONFIG, 3);
        //指定事务id
        properties.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"t1");
        //创建生产者对象
        KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
        //初始化、开启时候
        kafkaProducer.initTransactions();
        kafkaProducer.beginTransaction();
        try {
            //像first主题发送数据
            kafkaProducer.send(new ProducerRecord<>("first", 1,"","lzq"),new Callback() {
                @Override
                public void onCompletion(RecordMetadata recordMetadata, Exception e) {
                    if (e==null){
                        System.out.println("发送成功,主题"+recordMetadata.topic()+"分区"+recordMetadata.partition());
                    }
                }
            });
            //提交事务
            kafkaProducer.commitTransaction();
        }catch (Exception e){
            //回滚事务
            kafkaProducer.abortTransaction();
        }finally {
            //关闭资源
            kafkaProducer.close();
        }
    }
}

data out of order

Kafka in 1.x and later versions guarantees that the single partition of data is ordered, and the conditions are as follows:

(1) If idempotency is not enabled, max.in.flight.requests.per.connection needs to be set to 1.

(2) To enable idempotency, max.in.flight.requests.per.connection needs to be set to be less than or equal to 5.

Reason: Because after kafka1.x, after enabling idempotency, the kafka server will cache the metadata of the last five requests sent by the producer, so in any case, the data of the last five requests can be guaranteed to be in order. , which is to cache the request first in the reordering

load balancing

Create a josn file vim topics-to-move.json

{
 "topics": [
 {"topic": "first"}
 ],
 "version": 1
}

Generate a load balanced plan

bin/kafka-reassign-partitions.sh -- bootstrap-server 192.168.6.100:9092 --topics-to-move-json-file  topics-to-move.json --broker-list "0,1,2,3" --gener

A load balancing plan is automatically generated

After creating a josn file, copy the corresponding plan

vim increase-replication-factor.json

Implementation plan

bin/kafka-reassign-partitions.sh --bootstrap-server 192.168.6.100:9092 --reassignment-json-file increase-replication-factor.json --execute

verification plan

bin/kafka-reassign-partitions.sh --bootstrap-server 192.168.6.100:9092 --reassignment-json-file increase-replication-factor.json --verify

Decommission old nodes

Regenerate the execution plan

bin/kafka-reassign-partitions.sh --bootstrap-server 192.168.6.100:9092 --topics-to-move-json-file topics-to-move.json --broker-list "0,1,2" --generate

startup script

#!/bin/bash
case $1 in
"start")
    for i in ip address
    do
        ssh $i "absolute path"
;;
"stop")
;;
esac

Guess you like

Origin blog.csdn.net/weixin_52210557/article/details/123540618