[MybatisPlus] How to deal with data development cases of more than one million levels

1. Demand

There are more than one million pieces of data in the database table, and now it is necessary to send these data to the third party of Kafka through scheduled tasks.
insert image description here

2. Problems encountered

Scenario 1 (Failure)

First of all, I want to send it to Kafka page by page through paging, but there are two problems.

  • The first is that paging gets slower as it goes to the back. After analysis, it is: when a database table is too large and the offset value in LIMIT offset, length is too large, the SQL query statement will be very slow
  • The second is that during the query process, because the multi-data source switching of the encapsulation framework is used, the paging plug-in also has problems. This has not been resolved for the time being, and the solution is directly switched.

Scenario 2 (successful)

By querying some information, finally use the method of maximum ID to get data batch by batch;

The previous sql was:

select * from alert where 1=1 and (first_alert_time BETWEEN ? AND ? ) limit 1000000,100

The optimization is: by acquiring data larger than the current id each time, 100 items are obtained, and after processing, the next time is based on the previous last item as the next comparison id.

Note: Fields are indexed

select * from alert where 1=1 and (first_alert_time BETWEEN ? AND ? ) and alert_seq >1655600780346079 order by alert_seq asc LIMIT 100

If the primary key is not an auto-increment sequence, it is also possible to use time, according to the creation time:

select * from alert where 1=1 and (first_alert_time BETWEEN ? AND ? ) and create_time >'2020-01-01 12:29:23'  order by create_time asc LIMIT 100

3. Solve

Using scheme 2, the problem of paging is also solved, and the problem of slow query is also solved.

Finally on the code:

public void ltAssetSafeProcess(String username, String password, Integer businessType, String topic, Boolean isToday) {
    
    
        Long startTime = null;
        Long endTime = null;
        Integer count = eventSearchTaskMapper.getCount(businessType);
        if (Objects.nonNull(count) && count == 0) {
    
    
            // 初次上报直接全量
            startTime =  946656000000L;
            endTime = new Date().getTime();
        } else {
    
    
            // 增量上报
            QueryWrapper<EventSearchTask> queryWrapper = new QueryWrapper();
            queryWrapper.eq("business_type", businessType);
            queryWrapper.orderByDesc("create_time");
            queryWrapper.last("limit 1");
            EventSearchTask selectOne = eventSearchTaskMapper.selectOne(queryWrapper);
            startTime = selectOne.getEndTime();
            endTime = new Date().getTime();
        }
       // 处理上报批次
        EventSearchTask eventSearchTask = new EventSearchTask();
        eventSearchTask.setId($.toString(SnowFlake.nextId()));
        eventSearchTask.setDataSource(FeignTemplateType.LT_ZC_INFO.getType());
        eventSearchTask.setStartTime(startTime);
        eventSearchTask.setEndTime(endTime);
        eventSearchTask.setTimeDimension("TODAY");
        eventSearchTask.setBusinessType(businessType);
        eventSearchTask.setCreateTime(SystemClock.nowDate());

        // 处理上报批次
        QueryWrapper<AlertEntity> queryWrapper = new QueryWrapper<>();
        queryWrapper.lambda().between(AlertEntity::getFirstAlertTime,new Timestamp(startTime),new Timestamp(endTime));
        Integer alertCounts = alertService.getCount(queryWrapper);//获取总数
        eventSearchTask.setAlertCount(alertCounts);
        eventSearchTaskMapper.insert(eventSearchTask);
        int pageSize = 1000;
        int pageCount = PageUtils.pageCount(alertCounts, pageSize);//计算页数
        // 开始上报
        String lastAlertSeq = null;
        for (int i = 0; i <= pageCount; i++) {
    
    
            log.info("处理到第:{}页",i);
            // 使用alertSeq 去增量获取
            Map<String,Object> params = new HashMap<>();
            params.put("startTime",new Timestamp(startTime));
            params.put("endTime",new Timestamp(endTime));
            params.put("pageSize",pageSize);
            if(i == 0){
    
    
                // 获取最早的一条数据 最为第一次基点
                AlertEntity alertEntity = alertService.getStratAlertOne();
                params.put("alertSeq",alertEntity.getAlertSeq());
            }else {
    
    
                params.put("alertSeq",lastAlertSeq);
            }

            List<AlertEntity> records = alertService.getAlertPages(params);
            if(records.size() == 0){
    
    
                log.info("数据:{} 条",records.size());
            }
            log.info("处理数据:{} 条",records.size());
           for(AlertEntity entity : records){
    
    
               lastAlertSeq = entity.getAlertSeq();
               // 异步发送kafka
               threadPoolTaskExecutor.submit(() -> {
    
    
                   try {
    
    
                       JSON.toJSONString(entity, SerializerFeature.WriteMapNullValue);
                   } catch (Exception e) {
    
    
                       log.error("数据存在问题--------------------",e);
                       return;
                   }
                   kafkaAnalyzeProducer.send(topic, JSON.toJSONString(entity, SerializerFeature.WriteMapNullValue), new SendCallBack() {
    
    
                       @Override
                       public void sendSuccessCallBack(String topic, String msg) {
    
    
                           log.info("sendSuccessCallBack--------------{}----------{}",topic,msg);
                       }

                       @Override
                       public void sendFailCallBack(String topic, String msg, Throwable ex) {
    
    
                           log.info("sendFailCallBack--------------{}----------{}---------{}",topic,msg,ex);
                       }
                   });
               });
            }
        }
    }

Guess you like

Origin blog.csdn.net/daohangtaiqian/article/details/130320822