kafka transaction (pseudo transaction)

Business essentials knowledge

  • Kafka’s transaction control principle

Main principle: Start transaction-->Send a ControlBatch message (transaction starts)

Commit the transaction-->Send a ControlBatch message (transaction commit)

Abandon the transaction --> send a ControlBatch message (transaction terminated)

  • Necessary configuration parameters for starting a transaction (I don’t support data rollback, but I can do it, everyone will be prosperous, and everyone will suffer)

Properties props = new Properties();
props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"doit01:9092");
props.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
// acks
props.setProperty(ProducerConfig.ACKS_CONFIG,"-1");
// 生产者的重试次数
props.setProperty(ProducerConfig.RETRIES_CONFIG,"3");
// 飞行中的请求缓存最大数量
props.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"3");
// 开启幂等性
props.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,"true");
// 设置事务id
props.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"trans_001");

 Code template for transaction control

// 初始化事务
producer.initTransaction( )

// 开启事务    
producer.beginTransaction( )
    
// 干活

// 提交事务
producer.commitTransaction( )


// 异常回滚(放弃事务) catch里面
producer.abortTransaction( )

The consumer API will pull the data of uncommitted transactions; you can just choose whether to let users see it!

Whether to allow users to see the data of uncommitted transactions can be configured through consumer parameters:

isolation.level=read_uncommitted (default)

isolation.level=read_committed

  • Kafka also has an "advanced" transaction control , which is only targeted at one scenario:

The user's program must read source data from kafka, and the data processing results must be written to kafka.

Kafka can realize end-to-end transaction control (compared to the above "basic" transaction, it has one more function. The consumer's consumption offset can be bound to the transaction through the producer for submission)

producer.sendOffsetsToTransaction(offsets,consumer_id)

transaction api example

In order to implement transactions, the application must provide a unique transactional.id and enable idempotence of the producer

properties.put ("transactional.id","transactionid00001");
properties.put ("enable.idempotence",true);

The transaction methods provided in kafka producer are as follows:

 Example of code structure in a typical scenario of "consume kafka-process-produce results to kafka":

package com.doit.day04;

import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.apache.kafka.common.serialization.StringDeserializer;
import org.apache.kafka.common.serialization.StringSerializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;

public class Exercise_kafka2kafka {
    public static void main(String[] args) {

        Properties props = new Properties();
        //消费者的
        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092");
        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "shouwei");
        //自动提交偏移量
        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,"false");
        props.setProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");

        //写生产者的一些属性
        props.setProperty(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG,"linux01:9092");
        props.setProperty(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        props.setProperty(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

        //设置ack 开启幂等性必须设置的三个参数
        props.setProperty(ProducerConfig.ACKS_CONFIG,"-1");
        props.setProperty(ProducerConfig.RETRIES_CONFIG,"3");
        props.setProperty(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,"3");
        //开启幂等性
        props.setProperty(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG,"true");
        //开启事务
        props.setProperty(ProducerConfig.TRANSACTIONAL_ID_CONFIG,"doit40");

        //消费数据
        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
        KafkaProducer<String, String> producer = new KafkaProducer<>(props);
        //初始化事务
        producer.initTransactions();
        //订阅主题
        consumer.subscribe(Arrays.asList("eventlog"));
        while (true){
            //拉取数据
            ConsumerRecords<String, String> poll = consumer.poll(Duration.ofMillis(Integer.MAX_VALUE));
            try {
                //开启事务
                producer.beginTransaction();
                for (ConsumerRecord<String, String> record : poll) {
                    String value = record.value();
                    //将value的值写入到另外一个topic中
                    producer.send(new ProducerRecord<String,String>("k2k",value));
                }
                producer.flush();
                //提交偏移量
                consumer.commitAsync();
                //提交事务
                producer.commitTransaction();

            } catch (ProducerFencedException e) {
                //放弃事务
                producer.abortTransaction();
            }
        }
    }
}

Business practical cases

In actual data processing, consume-transform-produce is a common and typical scenario;

 In this scenario, we often need to implement the entire process from "reading source data, to business processing, to writing the processing results to Kafka" to be atomic:

Either the whole process succeeds, or everything fails!

(The consumer offset will not be submitted until the processing and output results are successful; if the processing or output results fail, the consumption offset will not be submitted)

To achieve the above requirements, you can use the transaction mechanism in Kafka :

It allows applications to process consuming messages , producing messages , and submitting consumption displacements as atomic operations, even if the production or consumption spans multiple topic partitions;

There is a parameter isolation.level on the consumer side, which is closely related to transactions. The default value of this parameter is "read_uncommitted", which means that the consumer application can see (consume) uncommitted transactions. Of course, for submitted transactions Transactions are also visible. This parameter can also be set to "read_committed", which means that the consumer application cannot see messages in uncommitted transactions.

 Control message (ControlBatch: COMMIT/ABORT) indicates whether the transaction is committed or abandoned

Guess you like

Origin blog.csdn.net/m0_53400772/article/details/131074464