Get into the habit of writing together! This is the 5th day of my participation in the "Nuggets Daily New Plan·April Update Challenge", click to view the details of the event
definition
Kafka offset replay is to reset the offset of the consumer
Scenes
- Historical messages need to be re-consumed
- kafka data migration
Strategy
Displacement dimension
Earliest
Adjust the displacement to the current earliest displacement
Earliest strategy means to adjust the displacement to the current earliest displacement of the subject. This earliest displacement is not necessarily 0, because in the production environment, messages that are far away will be automatically deleted by Kafka, so the current earliest displacement is likely to be a value greater than 0. If you want to re-consume all messages for a topic, you can use the Earliest strategy.
Latest
Adjust the displacement to the current latest displacement
The Latest strategy means to reset the displacement to the latest end displacement. If you send a total of 15 messages to a topic, then the latest end shift is 15. If you want to skip all historical messages and start consuming from the latest message, you can use the Latest strategy
Current
Adjust the displacement to the current latest commit displacement
The Current strategy means to adjust the displacement to the latest displacement currently submitted by the consumer. Sometimes you may encounter such a scenario: you modify the consumer program code and restart the consumer, and it turns out that there is a problem with the code, you need to roll back the previous code changes, and also reset the displacement to the consumer The location at the time of restart, then, the Current strategy can help you achieve this function.
Specified-Offset
Adjust the displacement to the specified displacement
The Specified-Offset strategy is a more general strategy, which means that the consumer adjusts the offset value to the offset you specify. A typical use case for this strategy is that when a consumer program processes an error message, you can manually "skip" the processing of this message. In actual use, there may be cases where corrupted messages cannot be consumed. At this time, the consumer program will throw an exception and cannot continue to work. Once you encounter this problem, you can try to use the Specified-Offset strategy to avoid it.
Shift-By-N
Adjust the displacement to the current displacement + N (N can be negative)
If the Specified-Offset strategy requires you to specify the absolute value of the displacement, then the Shift-By-N strategy specifies the relative value of the displacement, that is, you can give the distance of a message to be skipped. The "jump" here is bidirectional, you can "jump" forward or backward. For example, if you want to reset the displacement to the first 100 displacements of the current displacement, you need to specify N as -100
time dimension
DateTime
Adjust the displacement to the minimum displacement greater than the given time
DateTime allows you to specify a time and then reset the displacement to the earliest displacement after that time. A common usage scenario is that you want to re-consume yesterday's data, then you can use this strategy to reset the offset to 0 o'clock yesterday.
Duration
Adjust the displacement to the displacement of the specified interval from the current time
The Duration strategy refers to a given relative time interval, and then adjusts the displacement to the displacement from the current given time interval. The specific format is PnDTnHnMnS. If you are familiar with the Duration class introduced in Java 8, you should not be unfamiliar with this format. It is a Duration format that conforms to the ISO-8601 specification, starting with the letter P, followed by 4 parts, namely D, H, M and S, representing days, hours, minutes and seconds respectively. For example, if you want to set the offset back to 15 minutes ago, then you can specify PT0H15M0S.
operate
Api
The seek method of KafkaConsumer, or its variant methods seekToBeginning and seekToEnd.
package org.apache.kafka.clients.consumer;
.....
public class KafkaConsumer implements Consumer {
.....
@Override public void seek(TopicPartition partition, long offset) {
....
}
public void seekToBeginning(Collection partitions) {
....
}
public void seekToEnd(Collection partitions) {
....
}
....
}
复制代码
Implementation example
Earliest Implementation
Properties consumerProperties = new Properties();
......
String topic = "test"; // 要重设位移的 Kafka 主题
try (final KafkaConsumer consumer =
new KafkaConsumer<>(consumerProperties)) {
consumer.subscribe(Collections.singleton(topic));
consumer.poll(0);
consumer.seekToBeginning( consumer.partitionsFor(topic).stream().map(partitionInfo ->
new TopicPartition(topic, partitionInfo.partition()))
.collect(Collectors.toList()));
}
复制代码
Latest implementation
consumer.seekToEnd(
consumer.partitionsFor(topic).stream().map(partitionInfo ->
new TopicPartition(topic, partitionInfo.partition()))
.collect(Collectors.toList()));
复制代码
Current implementation
consumer.partitionsFor(topic).stream().map(info ->
new TopicPartition(topic, info.partition())) .forEach(tp -> {
long committedOffset = consumer.committed(tp).offset();
consumer.seek(tp, committedOffset); });
复制代码
Specified-Offset Implementation
long targetOffset = 1234L;
for (PartitionInfo info : consumer.partitionsFor(topic)) {
TopicPartition tp = new TopicPartition(topic, info.partition());
consumer.seek(tp, targetOffset);
}
复制代码
Shift-By-N implementation
for (PartitionInfo info : consumer.partitionsFor(topic)) {
// 假设向前跳 123 条消息
TopicPartition tp = new TopicPartition(topic, info.partition());
long targetOffset = consumer.committed(tp).offset() + 123L;
consumer.seek(tp, targetOffset);
}
复制代码
datatime implementation
long ts = LocalDateTime.of(
2020, 7, 20, 20, 0).toInstant(ZoneOffset.ofHours(8)).toEpochMilli();
Map timeToSearch = consumer.partitionsFor(topic).stream().map(info ->
new TopicPartition(topic, info.partition()))
.collect(Collectors.toMap(Function.identity(), tp -> ts));
for (Map.Entry entry : consumer.offsetsForTimes(timeToSearch).entrySet()) {
consumer.seek(entry.getKey(), entry.getValue().offset());
}
复制代码
Duration implementation
Map timeToSearch =
consumer.partitionsFor(topic).stream()
.map(info -> new TopicPartition(topic, info.partition())) .collect(Collectors.toMap(Function.identity(), tp ->
System.currentTimeMillis() - 30 * 1000 * 60));
for (Map.Entry entry :
consumer.offsetsForTimes(timeToSearch).entrySet()) {
consumer.seek(entry.getKey(), entry.getValue().offset());
}
复制代码
Finish
If you need to communicate and learn, you can pay attention to the public account [Reviewing the Old and Knowing the New Java], learn from each other, and make progress together.