SpringBoot efficiently inserts tens of thousands of data in batches

Preparation

1. The relevant dependencies introduced by the pom.xml file in the Maven project are as follows:

<dependencies>
    <!-- SpringBoot Web模块依赖 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>

    <!-- MyBatis-Plus 依赖 -->
    <dependency>
        <groupId>com.baomidou</groupId>
        <artifactId>mybatis-plus-boot-starter</artifactId>
        <version>3.3.1</version>
    </dependency>

    <!-- 数据库连接驱动 -->
    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
    </dependency>

    <!-- 使用注解,简化代码-->
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
    </dependency>
</dependencies>

2. Application.yml configuration properties file content (Key point: enable batch processing mode)

server:
    端口号 
    port: 8080

#  MySQL连接配置信息(以下仅简单配置,更多设置可自行查看)
spring:
    datasource:
         连接地址(解决UTF-8中文乱码问题 + 时区校正)
                (rewriteBatchedStatements=true 开启批处理模式)
        url: jdbc:mysql://127.0.0.1:3306/bjpowernode?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&rewriteBatchedStatements=true
         用户名
        username: root
         密码
        password: xxx
         连接驱动名称
        driver-class-name: com.mysql.cj.jdbc.Driver

3. Entity entity class (test)

/**
 *   Student 测试实体类
 *   
 *   @Data注解:引入Lombok依赖,可省略Setter、Getter方法
 */
@Data
@TableName(value = "student")
public class Student {
    
    
    
    /**  主键  type:自增 */
    @TableId(type = IdType.AUTO)
    private int id;

    /**  名字 */
    private String name;

    /**  年龄 */
    private int age;

    /**  地址 */
    private String addr;

    /**  地址号  @TableField:与表字段映射 */
    @TableField(value = "addr_num")
    private String addrNum;

    public Student(String name, int age, String addr, String addrNum) {
    
    
        this.name = name;
        this.age = age;
        this.addr = addr;
        this.addrNum = addrNum;
    }
}

4. Database student table structure (note: no index)

50fb64b0ac314991bcf2ed8360220da0~tplv-k3u1fbpfcp-zoom-in-crop-mark_4536_0_0_0.webp


test work

1. For loop insertion (single) (total time taken: 177 seconds)

Summary: The average test time is about 177 seconds, which is really unbearable to look at (face covering), because when using a for loop to perform a single insertion, each time it is necessary to obtain the connection (Connection), release the connection, and close the resource. ( If the amount of data is large) it consumes resources extremely and takes a long time.

@GetMapping("/for")
public void forSingle(){
    
    
    // 开始时间
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < 50000; i++){
    
    
        Student student = new Student("李毅" + i,24,"张家界市" + i,i + "号");
        studentMapper.insert(student);
    }
    // 结束时间
    long endTime = System.currentTimeMillis();
    System.out.println("插入数据消耗时间:" + (endTime - startTime));
}

testing time:
79e5b943df4f49109201cfb5d3a647f3~tplv-k3u1fbpfcp-zoom-in-crop-mark_4536_0_0_0.webp


2. Splicing SQL statements (total time taken: 2.9 seconds)

Concise: Splicing format: insert into student(xxxx) value(xxxx),(xxxx),(xxxxx)...
Summary: The splicing result is to integrate all the data into the value of a SQL statement, which is submitted to the server With fewer insert statements, the network load is reduced, and performance is improved. However, when the amount of data increases, memory overflow and time-consuming analysis of SQL statements may occur. However, compared with the first point, the performance is greatly improved.

 /**
 * 拼接sql形式
 */
@GetMapping("/sql")
public void sql(){
    
    
    ArrayList<Student> arrayList = new ArrayList<>();
    long startTime = System.currentTimeMillis();
    for (int i = 0; i < 50000; i++){
    
    
        Student student = new Student("李毅" + i,24,"张家界市" + i,i + "号");
        arrayList.add(student);
    }
    studentMapper.insertSplice(arrayList);
    long endTime = System.currentTimeMillis();
    System.out.println("插入数据消耗时间:" + (endTime - startTime));
}

mapper

public interface StudentMapper extends BaseMapper<Student> {
    
    

    @Insert("<script>" +
            "insert into student (name, age, addr, addr_num) values " +
            "<foreach collection='studentList' item='item' separator=','> " +
            "(#{item.name}, #{item.age},#{item.addr}, #{item.addrNum}) " +
            "</foreach> " +
            "</script>")

    int insertSplice(@Param("studentList") List<Student> studentList);
}

Test Results
7eebed7a22a6472fb1be3cfc6955ca47~tplv-k3u1fbpfcp-zoom-in-crop-mark_4536_0_0_0.webp


3. Batch insert saveBatch (total time taken: 2.7 seconds)

Concisely: Use MyBatis-Plus to implement the batch saveBatch() method in the IService interface. When viewing the underlying source code, you can find that it is actually a for loop insertion. But compared with the first point, why is the performance improved? Because sharding processing (batchSize = 1000) + the operation of submitting transactions in batches is used to improve performance, rather than consuming performance on Connection. (Currently, I personally think the solution is more optimal)

/**
 * mybatis-plus的批处理模式
 */
@GetMapping("/saveBatch1")
public void saveBatch1(){
    
    
    ArrayList<Student> arrayList = new ArrayList<>();
    long startTime = System.currentTimeMillis();
    // 模拟数据
    for (int i = 0; i < 50000; i++){
    
    
        Student student = new Student("李毅" + i,24,"张家界市" + i,i + "号");
        arrayList.add(student);
    }
    // 批量插入
    studentService.saveBatch(arrayList);
    long endTime = System.currentTimeMillis();
    System.out.println("插入数据消耗时间:" + (endTime - startTime));
}

15968993d9e04ee39eb9d260daf72605~tplv-k3u1fbpfcp-zoom-in-crop-mark_4536_0_0_0.webp
Important note: The MySQL JDBC driver ignores the executeBatch() statement in the saveBatch() method by default , splitting a group of SQL statements that need to be processed in batches, and sending them to the MySQL database one by one during execution, resulting in actual fragmented insertion, that is, Compared with the single insertion method, there is an improvement, but the performance has not been substantially improved.
Test: The database connection URL address is missing the rewriteBatchedStatements = true parameter.

#  MySQL连接配置信息
spring:
    datasource:
         连接地址(未开启批处理模式)
        url: jdbc:mysql://127.0.0.1:3306/bjpowernode?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai
         用户名
        username: root
         密码
        password: xxx
         连接驱动名称
        driver-class-name: com.mysql.cj.jdbc.Driver

Test result: 10541, which is approximately equal to 10.5 seconds (batch mode is not turned on)
a4314ddb36934434b6d947135ebc9ce1~tplv-k3u1fbpfcp-zoom-in-crop-mark_4536_0_0_0.webp


4. Loop insertion + turn on batch processing mode (total time taken: 1.7 seconds) (emphasis: one-time submission)

Concise: Turn on batch processing, turn off automatic transaction submission, and share the same SqlSession . The performance of single insertion in the for loop is substantially improved ; because the same SqlSession saves energy consumption of resource-related operations and reduces transaction processing time, etc. , thereby greatly improving execution efficiency. (Currently, I personally think the solution is more optimal)

/**
 * 共用同一个SqlSession
 */
@GetMapping("/forSaveBatch")
public void forSaveBatch(){
    
    
    //  开启批量处理模式 BATCH 、关闭自动提交事务 false

    SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH,false);
    //  反射获取,获取Mapper
    StudentMapper studentMapper = sqlSession.getMapper(StudentMapper.class);
    long startTime = System.currentTimeMillis();
    for (int i = 0 ; i < 50000 ; i++){
    
    
        Student student = new Student("李毅" + i,24,"张家界市" + i,i + "号");
        studentMapper.insert(student);
    }
    // 一次性提交事务
    sqlSession.commit();
    // 关闭资源
    sqlSession.close();
    long endTime = System.currentTimeMillis();
    System.out.println("总耗时: " + (endTime - startTime));
}

5. ThreadPoolTaskExecutor (total time taken: 1.7 seconds)

(Currently, I personally think the solution is more optimal)

    @Autowired
    private ThreadPoolTaskExecutor threadPoolTaskExecutor;
    @Autowired
    private PlatformTransactionManager transactionManager;

    @GetMapping("/batchInsert2")
    public void batchInsert2() {
    
    
        ArrayList<Student> arrayList = new ArrayList<>();
        long startTime = System.currentTimeMillis();
        // 模拟数据
        for (int i = 0; i < 50000; i++){
    
    
            Student student = new Student("李毅" + i,24,"张家界市" + i,i + "号");
            arrayList.add(student);
        }
        int count = arrayList.size();
        int pageSize = 1000; // 每批次插入的数据量
        int threadNum = count / pageSize + 1; // 线程数
        CountDownLatch countDownLatch = new CountDownLatch(threadNum);
        for (int i = 0; i < threadNum; i++) {
    
    
            int startIndex = i * pageSize;
            int endIndex = Math.min(count, (i + 1) * pageSize);
            List<Student> subList = arrayList.subList(startIndex, endIndex);
            threadPoolTaskExecutor.execute(() -> {
    
    
                DefaultTransactionDefinition transactionDefinition = new DefaultTransactionDefinition();
                TransactionStatus status = transactionManager.getTransaction(transactionDefinition);
                try {
    
    
                    studentMapper.insertSplice(subList);
                    transactionManager.commit(status);
                } catch (Exception e) {
    
    
                    transactionManager.rollback(status);
                    throw e;
                } finally {
    
    
                    countDownLatch.countDown();
                }
            });
        }
        try {
    
    
            countDownLatch.await();
        } catch (InterruptedException e) {
    
    
            e.printStackTrace();
        }
    }

First, a thread pool ( ThreadPoolTaskExecutor) is defined to manage the life cycle of threads and execute tasks. Then, we split the data list to be inserted into multiple sublists according to the specified batch size, and start multiple threads to perform the insertion operation.
First get the transaction manager through TransactionManager and use to TransactionDefinitiondefine transaction properties. Then, in each thread, we transactionManager.getTransaction()get the transaction state through the method and use that state to manage the transaction during the insert operation. After the insert operation is completed, we call transactionManager.commit()or transactionManager.rollback() method to commit or rollback the transaction based on the operation results. After each thread finishes executing, CountDownLatch 的 countDown() the method is called so that the main thread waits for all threads to finish executing before returning.

Guess you like

Origin blog.csdn.net/weixin_44030143/article/details/130825037