Accompany you to learn kafka (nine) - displacement replay

Get into the habit of writing together! This is the 5th day of my participation in the "Nuggets Daily New Plan·April Update Challenge", click to view the details of the event

definition

Kafka offset replay is to reset the offset of the consumer

Scenes

  • Historical messages need to be re-consumed
  • kafka data migration

Strategy

Displacement dimension

Earliest

Adjust the displacement to the current earliest displacement

Earliest strategy means to adjust the displacement to the current earliest displacement of the subject. This earliest displacement is not necessarily 0, because in the production environment, messages that are far away will be automatically deleted by Kafka, so the current earliest displacement is likely to be a value greater than 0. If you want to re-consume all messages for a topic, you can use the Earliest strategy.

Latest

Adjust the displacement to the current latest displacement

The Latest strategy means to reset the displacement to the latest end displacement. If you send a total of 15 messages to a topic, then the latest end shift is 15. If you want to skip all historical messages and start consuming from the latest message, you can use the Latest strategy

Current

Adjust the displacement to the current latest commit displacement

The Current strategy means to adjust the displacement to the latest displacement currently submitted by the consumer. Sometimes you may encounter such a scenario: you modify the consumer program code and restart the consumer, and it turns out that there is a problem with the code, you need to roll back the previous code changes, and also reset the displacement to the consumer The location at the time of restart, then, the Current strategy can help you achieve this function.

Specified-Offset

Adjust the displacement to the specified displacement

The Specified-Offset strategy is a more general strategy, which means that the consumer adjusts the offset value to the offset you specify. A typical use case for this strategy is that when a consumer program processes an error message, you can manually "skip" the processing of this message. In actual use, there may be cases where corrupted messages cannot be consumed. At this time, the consumer program will throw an exception and cannot continue to work. Once you encounter this problem, you can try to use the Specified-Offset strategy to avoid it.

Shift-By-N

Adjust the displacement to the current displacement + N (N can be negative)

If the Specified-Offset strategy requires you to specify the absolute value of the displacement, then the Shift-By-N strategy specifies the relative value of the displacement, that is, you can give the distance of a message to be skipped. The "jump" here is bidirectional, you can "jump" forward or backward. For example, if you want to reset the displacement to the first 100 displacements of the current displacement, you need to specify N as -100

time dimension

DateTime

Adjust the displacement to the minimum displacement greater than the given time

DateTime allows you to specify a time and then reset the displacement to the earliest displacement after that time. A common usage scenario is that you want to re-consume yesterday's data, then you can use this strategy to reset the offset to 0 o'clock yesterday.

Duration

Adjust the displacement to the displacement of the specified interval from the current time

The Duration strategy refers to a given relative time interval, and then adjusts the displacement to the displacement from the current given time interval. The specific format is PnDTnHnMnS. If you are familiar with the Duration class introduced in Java 8, you should not be unfamiliar with this format. It is a Duration format that conforms to the ISO-8601 specification, starting with the letter P, followed by 4 parts, namely D, H, M and S, representing days, hours, minutes and seconds respectively. For example, if you want to set the offset back to 15 minutes ago, then you can specify PT0H15M0S.

operate

Api

The seek method of KafkaConsumer, or its variant methods seekToBeginning and seekToEnd.

package org.apache.kafka.clients.consumer; 
..... 
public class KafkaConsumer implements Consumer { 
    ..... 
    @Override public void seek(TopicPartition partition, long offset) {
        .... 
    } 
    public void seekToBeginning(Collection partitions) { 
        .... 
    } 
    public void seekToEnd(Collection partitions) {
        .... 
    } 
        .... 
}
复制代码

Implementation example

Earliest Implementation

Properties consumerProperties = new Properties(); 
...... 
String topic = "test"; // 要重设位移的 Kafka 主题 
try (final KafkaConsumer consumer = 
    new KafkaConsumer<>(consumerProperties)) { 
        consumer.subscribe(Collections.singleton(topic)); 
        consumer.poll(0); 
        consumer.seekToBeginning( consumer.partitionsFor(topic).stream().map(partitionInfo -> 
        new TopicPartition(topic, partitionInfo.partition())) 
        .collect(Collectors.toList()));
}
复制代码

Latest implementation

consumer.seekToEnd( 
    consumer.partitionsFor(topic).stream().map(partitionInfo -> 
    new TopicPartition(topic, partitionInfo.partition())) 
    .collect(Collectors.toList()));

复制代码

Current implementation

consumer.partitionsFor(topic).stream().map(info -> 
    new TopicPartition(topic, info.partition())) .forEach(tp -> { 
    long committedOffset = consumer.committed(tp).offset(); 
    consumer.seek(tp, committedOffset); });
复制代码

Specified-Offset Implementation

long targetOffset = 1234L; 
for (PartitionInfo info : consumer.partitionsFor(topic)) { 
    TopicPartition tp = new TopicPartition(topic, info.partition()); 
    consumer.seek(tp, targetOffset); 
}
复制代码

Shift-By-N implementation

for (PartitionInfo info : consumer.partitionsFor(topic)) { 
    // 假设向前跳 123 条消息 
    TopicPartition tp = new TopicPartition(topic, info.partition()); 
    long targetOffset = consumer.committed(tp).offset() + 123L; 
    consumer.seek(tp, targetOffset); 
}
复制代码

datatime implementation

long ts = LocalDateTime.of( 
2020, 7, 20, 20, 0).toInstant(ZoneOffset.ofHours(8)).toEpochMilli(); 
Map timeToSearch = consumer.partitionsFor(topic).stream().map(info -> 
new TopicPartition(topic, info.partition())) 
.collect(Collectors.toMap(Function.identity(), tp -> ts)); 
for (Map.Entry entry : consumer.offsetsForTimes(timeToSearch).entrySet()) {   
    consumer.seek(entry.getKey(), entry.getValue().offset()); 
}
复制代码

Duration implementation

Map timeToSearch = 
consumer.partitionsFor(topic).stream() 
.map(info -> new TopicPartition(topic, info.partition())) .collect(Collectors.toMap(Function.identity(), tp -> 
System.currentTimeMillis() - 30 * 1000 * 60)); 
for (Map.Entry entry : 
    consumer.offsetsForTimes(timeToSearch).entrySet()) { 
    consumer.seek(entry.getKey(), entry.getValue().offset()); 
}
复制代码

Finish

If you need to communicate and learn, you can pay attention to the public account [Reviewing the Old and Knowing the New Java], learn from each other, and make progress together.

Guess you like

Origin juejin.im/post/7085225980905652255