java query database millions of data, optimization: multithreading + database

java million query statement optimization

Business needs

When I went to the interview today, hr asked a question about a large amount of data query.

Interviewer: "Our company is for data analysis. We need to query 1 million pieces of data from the database for analysis each time. Pagination cannot be used. How can we optimize sql or java code?"

If it takes more than 5 minutes to complete the query with ordinary query, we use index plus multithreading to achieve it.

Then let's get started! go! ! go! !

Database Design

Write database fields

Then to generate 1 million pieces of data

image-20230606110030790

Add an index to the database

image-20230606112107605

I still don’t know much about indexing. Everyone who knows can optimize the index.

Code

written in java

Controller class writing

package com.neu.controller;
 
import com.neu.mapper.UserMapper;
import com.neu.pojo.User;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.ModelAndView;

import javax.annotation.Resource;
import java.util.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
 
/**
 * 用户查询多线程用户Controller
 * @author 薄荷蓝柠
 * @since 2023/6/6
 */
@Controller
public class ExecutorUtils {
    
    

   @Resource
   private UserMapper userMapper;

 
   // 一个线程最大处理数据量
   private static final int THREAD_COUNT_SIZE = 5000;



   @RequestMapping("Executor")
   public List<User> executeThreadPool() {
    
    

      //计算表总数
      Integer integer = userMapper.UserSum();

      //记录开始时间
      long start = System.currentTimeMillis();


      //new个和表总数一样长的ArrayList
      List<User> threadList=new ArrayList<>(integer);

      // 线程数,以5000条数据为一个线程,总数据大小除以5000,再加1
      int round = integer / THREAD_COUNT_SIZE + 1;

      //new一个临时储存List的Map,以线程名为k,用做list排序
      Map<Integer,ArrayList> temporaryMap = new HashMap<>(round);

      // 程序计数器
      final CountDownLatch count = new CountDownLatch(round);

      // 创建线程
      ExecutorService executor = Executors.newFixedThreadPool(round);

      // 分配数据
      for (int i = 0; i < round; i++) {
    
    
         //该线程的查询开始值
         int startLen = i * THREAD_COUNT_SIZE;
         int k = i + 1;
         executor.execute(new Runnable() {
    
    
            @Override
            public void run() {
    
    
               ArrayList<User> users = userMapper.subList(startLen);
               //把查出来的List放进临时Map
               temporaryMap.put(k,users);
               System.out.println("正在处理线程【" + k + "】的数据,数据大小为:" + users.size());
               // 计数器 -1(唤醒阻塞线程)
               count.countDown();
            }
         });
      }
      try {
    
    
         // 阻塞线程(主线程等待所有子线程 一起执行业务)
         count.await();
         //结束时间
         long end = System.currentTimeMillis();
         System.out.println("100万数据查询耗时:" + (end - start) + "ms");
         //通过循环遍历临时map,把map的值有序的放进List里
         temporaryMap.keySet().forEach(k->{
    
    
            threadList.addAll(temporaryMap.get(k));
         });
      } catch (Exception e) {
    
    
         e.printStackTrace();
      } finally {
    
    
         //清除临时map,释放内存
         temporaryMap.clear();
         // 终止线程池
         // 启动一次顺序关闭,执行以前提交的任务,但不接受新任务。若已经关闭,则调用没有其他作用。
         executor.shutdown();
      }
      //输出list的长度
      System.out.println("list长度为:"+threadList.size());
      return threadList;
   }
}

Write Mapper

package com.neu.mapper;

import java.util.ArrayList;
import java.util.List;

import org.apache.ibatis.annotations.*;

import com.neu.pojo.User;

/**
 * 用户查询多线程用户Controller
 * @author 薄荷蓝柠
 * @since 2023/6/6
 */
@Mapper
public interface UserMapper {
    
    
    
    /**
	 * 检索user表的长度
	 * @return 表长度
	 */
	@Select("SELECT count(*) as sum FROM sysuser")
	Integer UserSum();

     /**
	 * 检索user表的所有记录
	 * @return 所有记录信息
	 */
	@Select("select * from sysuser LIMIT #{startLen},5000")
	ArrayList<User> subList(@Param("startLen") int startLen);
}

After writing, we test a wave –>

Test results within 20 seconds, much faster than before

fuzzy query

What about fuzzy queries?

Let's test it:

Modify Mapper

package com.neu.mapper;

import java.util.ArrayList;
import java.util.List;

import org.apache.ibatis.annotations.*;

import com.neu.pojo.User;

/**
 * 用户查询多线程用户Controller
 * @author 薄荷蓝柠
 * @since 2023/6/6
 */
@Mapper
public interface UserMapper {
    
    
	
    
     /**
	 * 检索user表id包含有“0”的长度
	 * @return 表长度
	 */
	@Select("SELECT count(*) as sum FROM sysuser where id like concat('%',0,'%')")
	Integer UserSum();

     /**
	 * 检索user表id包含有“0”的所有记录
	 * @return 所有记录信息
	 */
	@Select("select * from sysuser  where id like concat('%',0,'%') LIMIT #{startLen},5000")
	ArrayList<User> subList(@Param("startLen") int startLen);
}

After the modification is completed, we will test again –>

image-20230606110806810

It takes about 5 seconds to meet business needs

Finish

At present, the basic query has been written

Those who read this article can also optimize the following aspects:

  1. Indexes are optimized.
  2. How many pieces of data are most suitable for each thread to query? ?
  3. If a thread pool is configured, it can be used: the total number of entries/the number of threads == how many pieces of data each thread needs to query.
  4. Perform code optimization and optimize some time-consuming code.

Guess you like

Origin blog.csdn.net/m0_57647880/article/details/131064291