Kafka displacement submit

Above introduces the consumption displacement Kafka transferred from Zookeeper to their management. This paper describes about the submission displacement.
Consumer displacement required to report their data to Kafka, the report submitted by a process called displacement. Consumer data consumption simultaneously as a plurality of partitions, it is in fact submitted to displacement in the partition size, i.e. the displacement Consumer needs to submit their data each partition assigned to it.
Submit displacement mainly in order to characterize the progress of Consumer consumption, so that when restarting of Consumer failure, can read shift value from Kafka previously submitted and then continue at the displacement from the respective consumption to avoid spending the whole process over again. In other words, the displacement submission is Kafka gives you a tool or semantic security, you are responsible for maintaining the semantics guarantee that if you commit a displacement X, then Kafka would think that all the X displacement value is less than the message you have successfully consumed .
But the semantic shift safeguard submitted needs to be the responsibility of the user, Kafka only "no brain" to accept the displacement of your submission. Users directly affect the message semantics Consumer protection offered by the management of displacement submission.
Kafka displacement submit divided into manual and automatic submission submission default mode is automatically submitted from the user's point of view, in the case of zero-aware user will be the only consumer to submit up. Submit submit synchronous and asynchronous distributed from a consumer point of view.
Let's say for automatic and manual submission submission. The so-called automatic submission, refers to Kafka Consumer submit displacement quietly in the background for you, as a user you do not have to worry about these things; and manual submission, it means that you have to submit their own displacement, Kafka Consumer fundamentally matter. Methods enable auto-commit displacement is very simple. Consumer end has parameters enable.auto.commit, it simply does not set to true or it can be set up. Because its default value is true, which is the default Java Consumer automatically submit displacement. If you enable auto-commit, Consumer end there is a parameter comes in handy: auto.commit.interval.ms. Its default value is 5 seconds, indicating that Kafka will automatically be submitted once a shift for you every five seconds.
An example is given below:

 public static void main(String[] args) {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");
        props.put("group.id", "test");
        props.put("enable.auto.commit", "true");
        props.put("auto.commit.interval.ms", "5000");
        props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList("test1"));
        while (true) {
            ConsumerRecords<String, String> records = consumer.poll(1000);
            for (ConsumerRecord<String, String> record : records)
                System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value());
        }
    }

And automatically submit opposite is manually submitted. Looking at the above example it is easy to think of is set enable.auto.committo false. However, just set it to false is not enough, because you just do not tell Kafka Consumer automatic displacement submit it, you need to call the appropriate API manual submission displacement. API is the easiest KafkaConsumer.commitSync(). This method will submit KafkaConsumer.poll()the most return. Displacement. From the name of view, it is a synchronous operation, that is, the method will wait until the displacement was successfully submitted will not be returned. If an exception occurs submission process, the method will throw an exception information.
Given a sample below

while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
            process(records); 
            try {
                consumer.commitSync();
            } catch (CommitFailedException e) {
                e.printStackTrace();
            }
        }

We'd better go and submit displacement after handling logic otherwise premature submission may lead to data loss.
Since manual submission have this in mind, then automatically submitted to the next like what we analyze
all enable.auto.commit once set to true is automatically submitted, Kafka will ensure that at the beginning of the poll method calls, submitted last poll returned messages. From the sequence, the logic poll method is to first submit displacement on a batch of messages, the next batch of messages reprocessing, so that it can ensure the situation does not appear lost consumption. But a problem is that automatic submission displacement, it may appear that repeated consumption.
By default, Consumer automatic displacement submitted once every 5 seconds. Now, we assume that three seconds after the submission of the displacement occurred Rebalance operations. After Rebalance, Consumer displacement at all from last submitted continue to spend, but the displacement is already displacement data three seconds ago, it happened all the data before Rebalance 3 seconds consumption must be re-consumption again. Although you can by reducing auto.commit.interval.msto increase the frequency of submission of value, but do so only narrow time window of repeated consumption, it is impossible to completely eliminate it. This is a flaw automatic submission mechanism.
Its advantage is that a more flexible, timing and frequency you are completely able to control the displacement of submission. However, it also has a flaw, is calling commitSync (time), Consumer program is blocked until the distal Broker submit the results returned, this status will end. In any system, blocking because the program rather than a result of resource constraints are likely to be the bottleneck of the system, TPS will affect the entire application. Of course, you can choose to submit stretched across, but the consequences of doing so is to submit Consumer frequency decreases, when the Consumer occur Rebalance, there will be more news to be re-consumption.

In view of this problem, Kafka community displacement manual submission provides another API method :KafkaConsumer.commitAsync(). From the name of view it is not synchronized, but an asynchronous operation. After calling commitAsync (), it returns immediately, does not block, it will not affect the TPS Consumer applications. Because it is asynchronous, Kafka provides a callback function (callback), for you to realize the logic of submission of. Below is given a sample:

while (true) {
            ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
            process(records);
            consumer.commitAsync((offsets, exception) -> {
                if (exception != null){
                    exception.printStackTrace();
                }
            });
        }

Whether commitAsync commitSync able to replace it? The answer is no. commitAsync problem is that when problems arise it does not automatically retry. Because it is an asynchronous operation, automatic retry after a failure if submitted, displacement value then submitted it might retry already "expired" or not up to date worth it. Therefore, asynchronous retry submitted in fact does not make sense, so commitAsync will not retry.
Well, since the manual submission of two ways seems to have some flaws if you can make up with each other about it. Analyze the following

1. 我们可以利用 commitSync 的自动重试来规避那些瞬时错误,比如网络的瞬时抖动,Broker 端 GC 等。因为这些问题都是短暂的,自动重试通常都会成功,因此,我们不想自己重试,而是希望 Kafka Consumer 帮我们做这件事。
2. 我们不希望程序总处于阻塞状态,影响 TPS。

Then be given a sample:


        try {
            while (true) {
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
                process(records); 
                commitAysnc();
            }
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                consumer.commitSync();
            } finally {
                consumer.close();
            }
        }
        

This code uses both commitSync () and commitAsync (). For routine, manual submission stage, we call commitAsync () to avoid blocking program, but before the Consumer To close, we call commitSync () method performs synchronous blocking displacement submitted to ensure proper preservation of displacement former Consumer closed data. After combining the two, we both realized the asynchronous non-blocking displacement of management, but also to ensure the correctness of the Consumer displacement for reference only.

Guess you like

Origin www.cnblogs.com/haha174/p/11286211.html