Excel import and export millions of data optimization

background

When I was looking for an internship the year before last, I met an interviewer who asked me: How should mysql export millions of data from excel? The first reaction I heard was: I *, where can I get access to millions of data, what is the data you exported? I'm still a college student looking for an internship. Later, there were also various stereotyped essays, which introduced this kind of optimization of import and export. However, I refused to learn by swallowing jujubes and learning by memorizing stereotyped essays. I also tested it here, and I would like to thank Snailshigen in advance for providing me with high-quality code reference and analysis cases.

analyze

Million-level data export to Excel

Loop export

Novices and programmers who have never done this, don't be embarrassed, I know what you think. Isn't it just querying data and writing it into excel? See mine. First read the data one by one and put them in a list, then use Apache's POI to write them into excel, and then provide downloads.

Good or bad, I won’t comment here, I’m sure I’ll feel bad about it. Millions of data, how long do I have to run!

Batch query export

The technology of this idea knows that the sql part can be optimized. We query and write in batches, and then summarize it into an Excel file and download it directly. shigenJust write a little pseudocode.

Execel  excel = new Excel();
for (int i=0;i< page;i++) {
    
    
  List<data> data = getFromDB(i, pagesize);
  excel.write(data);
}
excel.close();
The thread pool starts

Knowing the loop, knowing that the methods in the loop are the same, but the parameters are different. Then I got a message; I can use the thread pool. However, my excel's final write completion needs to be known, CompletableFutureand this comes in handy. Only after all tasks are completed, the stream will be refreshed, marking the completion of writing to excel. Here, take a look shigenat the code design.

Thread pool asynchronous export

Loop export

Why do you still have to mention this? 批量查询导出Isn't it not recommended to loop, and then read and write data? Yes, shigenthat's exactly what it said. However, if you have the following two situations, maybe this method is your first choice and the best solution.

  • Will not use asynchronous tasks, will not thread pool
  • Exported data 主键IDis continuous

I won’t say much about the first situation. The first choice is also the instinct of people to think and solve problems. I'm only talking about the second one. Involved in sqlthe optimization.

select * from user limit 10, 1000;
select * from user where id>=10 limit 1000;

Two kinds of sql, guess which one will be more efficient? shigenDirectly reveal the answer, and those who know the reason are welcome to communicate in the comment area. The second is more efficient.

Then the code I wrote in the second way is like this.

Cyclic paging export

Then I tested these two methods, and the execution times are: 271ms 125ms. It is also obvious that the second code is simpler, right?

Import millions of data into Excel

This is still troublesome. Some people say that the previous operation is reversed? Yes, but the performance is not handled well, either it takes a long time, or it is straightforward OOM.

Here is shigenthe analysis:

Import 1 million data from excel to mysql

  • First, easyExcel reads 100w data in Excel in batches. EasyExcelGeneralDataListener reads the data line by line on the sheet page.
  • The second is to insert into the DB, how to insert these 200,000 pieces of data, the batch insert also cannot use the batch insert of Mybatis, the data will be read into the memory, and the transaction will be submitted as a whole
  • Use batch operations of JDBC+ transactions to insert data into the database (batch read + JDBC batch insert + manual transaction control)

The process of analysis is like this, so how to realize it? Show shigenthe written code:

    @GetMapping("/importExcel")
    public void importExcel(@RequestParam("file") MultipartFile file) throws IOException {
    
    
        if (file == null || file.isEmpty()) {
    
    
            throw new RuntimeException("file为空");
        }
        InputStream inputStream = file.getInputStream();
        // 记录开始读取Excel时间,也是导入程序开始时间
        long startReadTime = System.currentTimeMillis();
        log.info("------开始读取Excel的Sheet时间(包括导入数据过程):" + startReadTime + "ms------");
        // 读取所有Sheet的数据.每次读完一个Sheet就会调用这个方法
        EasyExcel.read(inputStream, new EasyExcelGeneralDataListener(userService)).doReadAll();
        long endReadTime = System.currentTimeMillis();
        log.info("------结束读取耗时" + (endReadTime - startReadTime) + "ms------");
    }

Then the key point is EasyExcelGeneralDataListenerinside: about its use, you can refer to the blog using easyexcel to read excel (implementing a general listener) . I shigenwent straight to the code.

Implementation of EasyExcelGeneralDataListener

Summarize

The above is the optimization idea for importing and exporting millions of data in Excel. It can be used as a case reference and code template, the code address is here. We also welcome your comments and exchanges. If you think the article is good, remember it 点赞、在看、转发、关注哈.

Together shigen, every day is different!

Guess you like

Origin blog.csdn.net/weixin_55768452/article/details/132428990