Apache POI operation millions of data excel practical solution and JDK performance monitoring tool Jvisualvm practical solution

Million Data Report Overview

1, Overview

We all know that Excel can be divided into the early Excel2003 version (HSSF object operation using POI) and the Excel2007 version (XSSF using POI

operation), both support millions of data as follows:

  • Excel 2003: When using HSSF objects in POI, Excel 2003 only allows storage of up to 65536 pieces of data, which is generally used to process smaller amounts of data. At this time, Excel will definitely not be able to accommodate millions of data.

  • Excel 2007: When POI is upgraded to XSSF object, it can directly support excel2007 and above versions because it adopts ooxml format. At this time, Excel can support 1,048,576 pieces of data, and a single sheet supports nearly one million pieces of data. However, there may still be problems during actual operation. The reason is that the row objects, cell objects, and font objects generated by executing the POI report will not be destroyed , which leads to the risk of OOM.

2. Introduction to JDK performance monitoring tools

Without performance monitoring tools, all inferences can only remain in the theoretical stage. We can use Java performance monitoring tools to monitor the running status of the program, including CUP, garbage collection, memory allocation and usage, which makes the running phase of the program become It is more controllable and can also be used to prove our conjecture. Here we use the performance tool Jvisualvm provided by the JDK to monitor program execution.

2.1. Overview of Jvisualvm

VisualVM is the profile sub-project of Netbeans. It has been included in JDK6.0 update 7. It can monitor threads, memory conditions, and view methods.

CPU time and objects in memory, objects that have been GCed, reverse view of allocated stack

2.2. Location of Jvisualvm

Jvisualvm is located in the JAVA_HOME/bin directory. You can open the program by double-clicking it. If you only monitor the local java process, there is no need to configure

If you set parameters, you can monitor them by opening them directly. First we need to open a Java program locally. For example, I open the employee microservice process. At this time

You can see the Java process related to IDEA in the jvisualvm interface:
Insert image description here

2.3. Use of Jvisualvm

Jvisualvm is relatively simple to use. Double-click the currently running process to enter the program's monitoring interface.
Insert image description here

  • Overview: You can see the startup parameters of the process.
  • Monitoring: Upper left: cpu utilization, gc status monitoring, upper right: heap utilization, permanent memory area utilization, lower left: class monitoring, lower right: thread monitoring.
  • Thread: It can display the name and running status of the thread, which is essential when debugging multi-threads. You can also click on a thread to view the detailed running status of this thread.

3, Solution plan analysis

For Excel import and export of millions of data, only solutions based on Excel 2007 will be discussed. ApachePoi officially provides instructions for operating large amounts of data.

Import and export tools and solutions for operating Excel 2007 using XSSF objects can be divided into three modes:

  • User mode: User mode has many encapsulated methods that are simple to operate, but create too many objects and consume a lot of memory (the method used before)
  • Event mode: Parse XML based on SAX. SAX's full name is Simple API for XML. It is an interface and a software package. It is an alternative method of XML parsing. Different from the way that DOM loads all contents into memory at once when parsing XML documents, it scans the document line by line and parses while scanning.
  • SXSSF object: is used to generate massive excel data files. The main principle is to use temporary storage space to generate excel.

Insert image description here

This is a picture officially provided by Apache POI, which describes the characteristics of operating Excel in three ways: user mode, event mode, and using SXSSF.

performance as well as CUP and memory usage.

4, Million number table introduction

4.1 Demand analysis

Use Apache POI to export Excel reports with millions of data

4.2 Solution plan

4.2.1 Thought analysis

Exporting Excel reports based on XSSFWork saves all cell objects in memory. After all Excel cells are created, they are written to Excel and exported at once. When Excel exports millions of data, as tables are continuously created, more and more objects are stored in the memory until the memory overflows. Apache Poi provides the SXSSFWork object, which is specially used to process the export of Excel reports with large amounts of data.

4.2.2 Principle analysis

When instantiating the SXSSFWork object, you can specify the number of POI export-related objects generated in the memory (default 100). Once the number of objects in the memory reaches this specified value, the contents of these objects in the memory will be By writing to disk (XML file format), these objects can be destroyed from memory. As long as this value is reached in the future, they will be processed in a similar way until the Excel export is completed.

4.3 dai 码实现

Replace the previous XSSFWorkbook based on the original code and use SXSSFWorkbook to complete the creation process.

4.3.1、UserReportResult
package com.example.payment.pojo;

import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.Setter;
import lombok.ToString;

/**
 * @author :
 * @date :Created in 10:12 2022/12/22
 * @description :
 * @version: 1.0
 */
@Getter
@Setter
@NoArgsConstructor
@ToString
public class UserReportResult {
    
    

    private String userId;
    private String username;
    private String departmentName;
    private String mobile;
    private String timeOfEntry;
    private String companyId;
    private String sex;
    /**
     * 出生日期
     */
    private String dateOfBirth;
    /**
     * 最高学历
     */
    private String theHighestDegreeOfEducation;
    /**
     * 国家地区
     */
    private String nationalArea;
    /**
     * 护照号
     */
    private String passportNo;
    /**
     * 身份证号
     */
    private String idNumber;
    /**
     * 身份证照片-正面
     */
    private String idCardPhotoPositive;
    /**
     * 身份证照片-背面
     */
    private String idCardPhotoBack;
    /**
     * 籍贯
     */
    private String nativePlace;
    /**
     * 民族
     */
    private String nation;
    /**
     * 英文名
     */
    private String englishName;
    /**
     * 婚姻状况
     */
    private String maritalStatus;
    /**
     * 员工照片
     */
    private String staffPhoto;
    /**
     * 生日
     */
    private String birthday;
    /**
     * 属相
     */
    private String zodiac;
    /**
     * 年龄
     */
    private String age;
    /**
     * 星座
     */
    private String constellation;
    /**
     * 血型
     */
    private String bloodType;
    /**
     * 户籍所在地
     */
    private String domicile;
    /**
     * 政治面貌
     */
    private String politicalOutlook;
    /**
     * 入党时间
     */
    private String timeToJoinTheParty;
    /**
     * 存档机构
     */
    private String archivingOrganization;
    /**
     * 子女状态
     */
    private String stateOfChildren;
    /**
     * 子女有无商业保险
     */
    private String doChildrenHaveCommercialInsurance;
    /**
     * 有无违法违纪行为
     */
    private String isThereAnyViolationOfLawOrDiscipline;
    /**
     * 有无重大病史
     */
    private String areThereAnyMajorMedicalHistories;
    /**
     * QQ
     */
    private String qq;
    /**
     * 微信
     */
    private String wechat;
    /**
     * 居住证城市
     */
    private String residenceCardCity;
    /**
     * 居住证办理日期
     */
    private String dateOfResidencePermit;
    /**
     * 居住证截止日期
     */
    private String residencePermitDeadline;
    /**
     * 现居住地
     */
    private String placeOfResidence;
    /**
     * 通讯地址
     */
    private String postalAddress;
    /**
     * 联系手机
     */
    private String contactTheMobilePhone;
    /**
     * 个人邮箱
     */
    private String personalMailbox;
    /**
     * 紧急联系人
     */
    private String emergencyContact;
    /**
     * 紧急联系电话
     */
    private String emergencyContactNumber;
    /**
     * 社保电脑号
     */
    private String socialSecurityComputerNumber;
    /**
     * 公积金账号
     */
    private String providentFundAccount;
    /**
     * 银行卡号
     */
    private String bankCardNumber;
    /**
     * 开户行
     */
    private String openingBank;
    /**
     * 学历类型
     */
    private String educationalType;
    /**
     * 毕业学校
     */
    private String graduateSchool;
    /**
     * 入学时间
     */
    private String enrolmentTime;
    /**
     * 毕业时间
     */
    private String graduationTime;
    /**
     * 专业
     */
    private String major;
    /**
     * 毕业证书
     */
    private String graduationCertificate;
    /**
     * 学位证书
     */
    private String certificateOfAcademicDegree;
    /**
     * 上家公司
     */
    private String homeCompany;
    /**
     * 职称
     */
    private String title;
    /**
     * 简历
     */
    private String resume;
    /**
     * 有无竞业限制
     */
    private String isThereAnyCompetitionRestriction;
    /**
     * 前公司离职证明
     */
    private String proofOfDepartureOfFormerCompany;
    /**
     * 备注
     */
    private String remarks;

    /**
     * 离职时间
     */
    private String resignationTime;
    /**
     * 离职类型
     */
    private String typeOfTurnover;
    /**
     * 申请离职原因
     */
    private String reasonsForLeaving;
}

4.3.2、UserReportController
package com.example.payment.controller;

import com.example.payment.pojo.UserReportResult;
import com.example.payment.service.RiskCalculateService;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.ClassPathResource;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletResponse;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URLEncoder;
import java.util.ArrayList;
import java.util.List;

/**
 * @author :
 * @date :Created in 10:00 2022/12/23
 * @description :
 * @version: 1.0
 */
@Controller
@RequestMapping("/userReport")
@Slf4j
public class UserReportController {
    
    


    @GetMapping("/download")
    public void download(HttpServletResponse response){
    
    
        log.info("[userReport-download]开始:{}");
        try {
    
    
            //1.模拟用户数据报表数据
            List<UserReportResult> list = new ArrayList<>();
            for (int i = 0; i < 100000; i++) {
    
    
                UserReportResult userReportResult = new UserReportResult();
                userReportResult.setEducationalType(i+"");
                userReportResult.setAge(i+"");
                userReportResult.setBirthday(i+"");
                userReportResult.setUserId(i+"");
                userReportResult.setUsername(i+"");
                userReportResult.setMobile(i+"");
                userReportResult.setTheHighestDegreeOfEducation(i+"");
                userReportResult.setNationalArea(i+"");
                userReportResult.setPassportNo(i+"");
                userReportResult.setNativePlace(i+"");
                userReportResult.setZodiac(i+"");
                userReportResult.setTimeOfEntry(i+"");
                userReportResult.setTypeOfTurnover(i+"");
                userReportResult.setReasonsForLeaving(i+"");
                userReportResult.setResignationTime(i+"");
                list.add(userReportResult);
            }
            //2.构造Excel
            //创建工作簿
            //SXSSFWorkbook : 百万数据报表
            //Workbook wb = new XSSFWorkbook();
            SXSSFWorkbook wb = new SXSSFWorkbook(100); //阈值,内存中的对象数量最大数量
            //构造sheet
            Sheet sheet = wb.createSheet();
            //创建行
            //标题
            String [] titles = "编号,姓名,手机,最高学历,国家地区,护照号,籍贯,生日,属相,入职时间,用户类型,描述,时间".split(",");
            //处理标题

            Row row = sheet.createRow(0);

            int titleIndex=0;
            for (String title : titles) {
    
    
                Cell cell = row.createCell(titleIndex++);
                cell.setCellValue(title);
            }

            int rowIndex = 1;
            Cell cell=null;
            for(int i=0;i<10;i++) {
    
    
                for (UserReportResult userReportResult : list) {
    
    
                    row = sheet.createRow(rowIndex++);
                    // 编号,
                    cell = row.createCell(0);
                    cell.setCellValue(userReportResult.getUserId());
                    // 姓名,
                    cell = row.createCell(1);
                    cell.setCellValue(userReportResult.getUsername());
                    // 手机,
                    cell = row.createCell(2);
                    cell.setCellValue(userReportResult.getMobile());
                    // 最高学历,
                    cell = row.createCell(3);
                    cell.setCellValue(userReportResult.getTheHighestDegreeOfEducation());
                    // 国家地区,
                    cell = row.createCell(4);
                    cell.setCellValue(userReportResult.getNationalArea());
                    // 护照号,
                    cell = row.createCell(5);
                    cell.setCellValue(userReportResult.getPassportNo());
                    // 籍贯,
                    cell = row.createCell(6);
                    cell.setCellValue(userReportResult.getNativePlace());
                    // 生日,
                    cell = row.createCell(7);
                    cell.setCellValue(userReportResult.getBirthday());
                    // 属相,
                    cell = row.createCell(8);
                    cell.setCellValue(userReportResult.getZodiac());
                    // 入职时间,
                    cell = row.createCell(9);
                    cell.setCellValue(userReportResult.getTimeOfEntry());
                    // 离职类型,
                    cell = row.createCell(10);
                    cell.setCellValue(userReportResult.getTypeOfTurnover());
                    // 离职原因,
                    cell = row.createCell(11);
                    cell.setCellValue(userReportResult.getReasonsForLeaving());
                    // 离职时间
                    cell = row.createCell(12);
                    cell.setCellValue(userReportResult.getResignationTime());
                }
            }
                String fileName = URLEncoder.encode("2022-12-22用户信息.xlsx", "UTF-8");
                response.setContentType("application/octet-stream");
                response.setHeader("content-disposition", "attachment;filename=" + new String(fileName.getBytes("ISO8859-1")));
                response.setHeader("filename", fileName);
                wb.write(response.getOutputStream());
        }catch (Exception e) {
    
    
            log.error("[userReport-download]error:{}", e.getMessage());
        }
    }

4.4. Comparative test

(1) XSSFWorkbook generates millions of data reports

Using XSSFWorkbook to generate Excel reports takes a long time. As time goes by, the memory usage increases until the memory overflows.

Insert image description here

(2) SXSSFWorkbook generates millions of data reports

Use SXSSFWorkbook to generate Excel reports, the memory usage is relatively gentle
Insert image description here

5, million number table reading

5.1 Demand analysis

Use POI to parse the Excel file provided by the case based on the event pattern

5.2 Solution plan

5.2.1 Thought analysis
  • User mode: When loading and reading Excel, all data is loaded into memory at one time and then the contents of each cell are parsed. When the amount of Excel data is large, insufficient memory or even OOM exceptions may occur due to different operating environments.
  • Event mode: It scans the document line by line, parsing as it scans. Because the application only examines the data as it is read, there is no need to store the data in memory, which is a huge advantage when parsing large documents.
5.2.2 Walkthrough analysis

(1) Set the event mode of POI

  • Get the file stream according to Excel;
  • Create OPCPackage based on file stream;
  • Create XSSFReader object;

(2) Sax analysis

  • Custom Sheet processor;
  • Create a Sax XmlReader object;
  • Set the Sheet event handler;
  • Read line by line;
5.2.3 Principle analysis

We all know that the essence of Excel 2007 is a special XML storage data, so we can use the SAX-based method to parse XML to complete Excel reading. SAX provides a mechanism for reading data from XML documents. It scans the document line by line, parsing it as it scans. Since the application only examines the data as it is read, there is no need to store the data in memory, which is a huge advantage when parsing large documents.

Insert image description here

5.3 dai 码实现

5.3.1 Custom processor
Focus on the endRow method
package com.example.payment.utils;

import com.example.payment.pojo.PoiEntity;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.usermodel.XSSFComment;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * @author :
 * @date :Created in 17:02 2022/12/22
 * @description :自定义Sheet基于Sax的解析处理器
 * @version: 1.0
 */
@Slf4j
public class SheetHandler implements XSSFSheetXMLHandler.SheetContentsHandler {
    
    

    //封装实体对象
    private PoiEntity entity;

    private AtomicInteger count = new AtomicInteger(0);

        /**
         * 解析行开始
         */
        @Override
        public void startRow(int rowNum) {
    
    
           if (rowNum > 0) {
    
    
             entity = new PoiEntity();
           }
        }
        /**
        * 解析每一个单元格
        */
        @Override
        public void cell(String cellReference, String formattedValue, XSSFComment comment) {
    
    
            if(entity != null) {
    
    
                switch (cellReference.substring(0, 1)) {
    
    
                    case "A":
                       entity.setId(formattedValue);
                       break;
                    case "B":
                       entity.setBreast(formattedValue);
                       break;
                    case "C":
                       entity.setAdipocytes(formattedValue);
                       break;
                    case "D":
                       entity.setNegative(formattedValue);
                       break;
                    case "E":
                       entity.setStaining(formattedValue);
                       break;
                    case "F":
                       entity.setSupportive(formattedValue);
                       break;
                    default:
                       break;
                }
            }
        }

        /**
         * 解析行结束
         */
         public void endRow(int rowNum) {
    
    
             //TODO 将数据存表等操作
             log.info("[解析行结束]第{}行,entity:{}", count.incrementAndGet(), entity);
         }

         /**
          * 处理头尾
          */
         public void headerFooter(String text, boolean isHeader, String tagName) {
    
    
         }

}

5.3.2 Self-determined analysis
package com.example.payment.utils;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.model.StylesTable;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.InputStream;

/**
 * @author :
 * @date :Created in 17:13 2022/12/22
 * @description :自定义Excel解析器
 * @version: 1.0
 */
public class ExcelParser {
    
    

    public void parse(InputStream is) throws Exception {
    
    
        //1.根据Excel获取OPCPackage对象
        OPCPackage pkg = OPCPackage.open(is);
        try {
    
    
            //2.创建XSSFReader对象
            XSSFReader reader = new XSSFReader(pkg);
            //3.获取SharedStringsTable对象
            SharedStringsTable sst = reader.getSharedStringsTable();
            //4.获取StylesTable对象
            StylesTable styles = reader.getStylesTable();
            //5.创建Sax的XmlReader对象
            XMLReader parser = XMLReaderFactory.createXMLReader();
            //6.设置处理器
            parser.setContentHandler(new XSSFSheetXMLHandler(styles, sst, new SheetHandler(), false));
            XSSFReader.SheetIterator sheets = (XSSFReader.SheetIterator)
            reader.getSheetsData();
            //7.逐行读取
            while (sheets.hasNext()) {
    
    
                InputStream sheetstream = sheets.next();
                InputSource sheetSource = new InputSource(sheetstream);
                try {
    
    
                  parser.parse(sheetSource);
                } finally {
    
    
                  sheetstream.close();
                }
            }
        } finally {
    
    
            pkg.close();
        }
    }

}

5.3.3 PoiEntity
package com.example.payment.pojo;


public class PoiEntity {
    
    
    private String id;
    private String breast;
    private String adipocytes;
    private String negative;
    private String staining;
    private String supportive;

    public String getId() {
    
    
        return id;
    }

    public void setId(String id) {
    
    
        this.id = id;
    }

    public String getBreast() {
    
    
        return breast;
    }

    public void setBreast(String breast) {
    
    
        this.breast = breast;
    }

    public String getAdipocytes() {
    
    
        return adipocytes;
    }

    public void setAdipocytes(String adipocytes) {
    
    
        this.adipocytes = adipocytes;
    }

    public String getNegative() {
    
    
        return negative;
    }

    public void setNegative(String negative) {
    
    
        this.negative = negative;
    }

    public String getStaining() {
    
    
        return staining;
    }

    public void setStaining(String staining) {
    
    
        this.staining = staining;
    }

    public String getSupportive() {
    
    
        return supportive;
    }

    public void setSupportive(String supportive) {
    
    
        this.supportive = supportive;
    }

    @Override
    public String toString() {
    
    
        return "PoiEntity{" +
                "id='" + id + '\'' +
                ", breast='" + breast + '\'' +
                ", adipocytes='" + adipocytes + '\'' +
                ", negative='" + negative + '\'' +
                ", staining='" + staining + '\'' +
                ", supportive='" + supportive + '\'' +
                '}';
    }
}

5.3.4 Put data under resources

Insert image description here
![Insert image description here](https://img-blog.csdnimg.cn/13908ab340034e50998a90b6a61755b8.png

5.3.5 UserReportController
package com.example.payment.controller;

import com.example.payment.pojo.UserReportResult;
import com.example.payment.service.RiskCalculateService;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.core.io.ClassPathResource;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import javax.servlet.ServletOutputStream;
import javax.servlet.http.HttpServletResponse;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URLEncoder;
import java.util.ArrayList;
import java.util.List;

/**
 * @author :
 * @date :Created in 10:00 2022/12/23
 * @description :
 * @version: 1.0
 */
@Controller
@RequestMapping("/userReport")
@Slf4j
public class UserReportController {
    
    


    @GetMapping("/read")
    public void download(HttpServletResponse response){
    
    
        log.info("[userReport-download]开始:{}");
        try {
    
    
            ExcelParser excelParser = new ExcelParser();
            InputStream is = this.getClass().getClassLoader().getResourceAsStream("demo.xlsx");
            excelParser.parse(is);
        }catch (Exception e) {
    
    
            log.error("[userReport-download]error:{}", e.getMessage());
        }
    }

5.4 瀻结

Through simple analysis and comparison of the two running modes, we can see that in user mode, simpler code is used to achieve Excel reading, but the CPU and memory are not ideal when reading large files; while in event mode, although the code is faster to write It is more cumbersome, but the CPU and memory are more dominant when reading large files.

Guess you like

Origin blog.csdn.net/Java__EE/article/details/128412444