Solutions for Kafka repeated consumption and safe shutdown of consumer threads

Background and cause analysis

Every time the Kafka consumer program is restarted, repeated consumption will occur. It is considered that when the program is killed, some of the consumed data is not submitted to offset.

props.setProperty("enable.auto.commit", "true");

This indicates automatic submission, that is, delayed submission (when polling, it will be detected and submitted according to the configured automatic submission time interval). When the program is killed, the program may be killed before the consumed data reaches the submission time point.

Repeat consumption solutions:

Turn off automatic submission and use asynchronous submission + synchronous submission to submit offset.

// 关闭自动提交
props.setProperty("enable.auto.commit", "false");
// 消费逻辑
try {
    
    
    while (true) {
    
    
        ConsumerRecords<String, byte[]> records = consumer.poll(Duration.ofMillis(100));
        for (ConsumerRecord<String, byte[]> record : records) {
    
    
            // 具体业务逻辑
        }
        consumer.commitAsync();
    }
    System.out.println("while end.");
} catch (Exception e) {
    
    
    System.err.println("consume error..." + e.getMessage());
} finally {
    
    
    try {
    
    
        consumer.commitSync();
        System.out.println("commit sync suc.");
    } catch (Exception e) {
    
    
        System.err.println("commit sync error." + e.getMessage());
    } finally {
    
    
        consumer.close();
        System.out.println("close.");
    }
}

This is not enough. When you kill the program, you will find that it has not reached finally. Indicates that the thread stopped abnormally.

Thread-safe shutdown solution:

1. Use the thread pool to run the thread
2. Use the end flag to manually stop the thread before the instance is destroyed
3. Use CountDownLatch to wait for the thread to stop

Step 1: Define the thread pool

@Bean
public ThreadPoolTaskExecutor threadPoolTaskExecutor() {
    
    
    int cpuCoreNum = Runtime.getRuntime().availableProcessors();
    ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
    threadPoolTaskExecutor.setCorePoolSize(cpuCoreNum);
    threadPoolTaskExecutor.setMaxPoolSize(cpuCoreNum * 2);
    threadPoolTaskExecutor.setQueueCapacity(2000);
    threadPoolTaskExecutor.setKeepAliveSeconds(60);
    threadPoolTaskExecutor.setThreadNamePrefix("global_thread_pool_task_executor");
    threadPoolTaskExecutor.setRejectedExecutionHandler(new ThreadPoolExecutor.DiscardOldestPolicy());
    threadPoolTaskExecutor.setWaitForTasksToCompleteOnShutdown(true);
    threadPoolTaskExecutor.setAwaitTerminationSeconds(10);// 确保该值是线程池中各个线程阻塞的最大时长
    threadPoolTaskExecutor.initialize();
    return threadPoolTaskExecutor;
}

The two configuration parameters here are crucial.
setWaitForTasksToCompleteOnShutdown(true) means waiting for ongoing and queued tasks to complete.
threadPoolTaskExecutor.setAwaitTerminationSeconds(10) Although we have configured to wait for ongoing and queued tasks to complete, Spring still proceeds to close the rest of the container. This may free up resources required by the task executor and cause the task to fail. Configuring this maximum wait time ensures that the container-level shutdown process will be blocked for the specified period of time.
The waiting time setting depends on the maximum time consumption of the business thread in the thread pool.
If you do not stop the thread, the waiting time of the thread pool will be exceeded. From the following WARN log, we can find that when stopping the thread pool, there is still a situation where the business thread is not stopped, so a flag needs to be defined to manually stop the thread.

WARN 11472 --- [extShutdownHook] o.s.s.concurrent.ThreadPoolTaskExecutor  : Timed out while waiting for executor 'threadPoolTaskExecutor' to terminate

Step 2: Define the end flag and stop the thread before the object is destroyed

// 线程中断标志
public volatile boolean running = true;
while (running) {
    
    
	...
}

Then implement the destroy method in the DisposableBean interface, and set running to false to stop the thread before the instance is destroyed.

@Override
public void destroy() throws Exception {
    
    
    this.running = false; // 循环并非立即停止,而是等到当前执行的循环体执行结束才会停止,所以这个地方的等待时间需要与线程池中的setAwaitTerminationSeconds参数相对应
}

When the destroy method ends, the system will destroy the current instance, and then begin to destroy the dependencies of the current instance (if not referenced by other instances). At this time, it should be noted that the thread has not actually ended. So the problem arises: the thread is still running, and the resources required for operation (such as Redis connection resources) are closed in advance, which will cause an exception. Therefore, after setting running to false, you need to use CountDwonLatch to wait for the thread to end, and then destroy other dependencies.
The third step is omitted here and the complete sample code is uploaded directly:

@Component
public class ConsumerClosedSafely implements CommandLineRunner, DisposableBean {
    
    

    private volatile boolean running = true;
    private final CountDownLatch latch = new CountDownLatch(1);
    private final String[] topics = new String[]{
    
    "test"};

    @Autowired
    private ThreadPoolTaskExecutor threadPoolTaskExecutor;

    public void consume() throws Exception{
    
    
        Properties props = new Properties();
        //TODO 其它属性
        props.setProperty("enable.auto.commit", "false");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Arrays.asList(topics));
        // 消费逻辑
        try {
    
    
            while (running) {
    
    
                ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100));
                for (ConsumerRecord<String, String> record : records) {
    
    
                    //TODO 具体业务逻辑
                }
                consumer.commitAsync();
            }
            System.out.println("while end.");
        } catch (Exception e) {
    
    
            System.err.println("consume error..." + e.getMessage());
        } finally {
    
    
            try {
    
    
                consumer.commitSync();
                System.out.println("commit sync suc.");
            } catch (Exception e) {
    
    
                System.err.println("commit sync error." + e.getMessage());
            } finally {
    
    
                consumer.close();
                System.out.println("close.");
                // 计数器减一
                latch.countDown();
                System.out.println("latch count down .");
            }
        }
    }

    @Override
    public void run(String... args) throws Exception {
    
    
        Runnable r = ()->{
    
    
            try {
    
    
                consume();
            } catch (Exception e) {
    
    
                System.exit(1);
            }
        };
        threadPoolTaskExecutor.execute(r);
    }

    @Override
    public void destroy() throws Exception {
    
    
        // 终止循环
        this.running= false;
        // 等待运行结束
        latch.await();
    }
}

Guess you like

Origin blog.csdn.net/weixin_43932590/article/details/128865349