POI memory overflow analysis and solution

When using POI for excel operations, memory overflow exceptions often occur when the amount of data is large. Below we analyze how to solve this problem

1. POI structure diagram



 Second, the memory overflow problem

In the project, when 200,000 rows of data are written to excel, there will be memory overflow. The general method is to increase the memory of tomcat, but if it is adjusted to 2048M, a memory overflow error will still be reported. So we analyze the reason. By analyzing its source code, we found that the implementation steps are to read the HSSFRow structure of TreeMap type through InputStream line by line , so when the amount of data is large, it will cause memory overflow.

  public HSSFWorkbook(DirectoryNode directory, boolean preserveNodes)
            throws IOException
    {
        super(directory);
        String workbookName = getWorkbookDirEntryName(directory);

        this.preserveNodes = preserveNodes;

        // If we're not preserving nodes, don't track the
        //  POIFS any more
        if(! preserveNodes) {
            clearDirectory();
        }

        _sheets = new ArrayList<HSSFSheet>(INITIAL_CAPACITY);
        names  = new ArrayList<HSSFName>(INITIAL_CAPACITY);

        // Grab the data from the workbook stream, however
        //  it happens to be spelled.
        InputStream stream = directory.createDocumentInputStream(workbookName);

        List<Record> records = RecordFactory.createRecords(stream);

        workbook = InternalWorkbook.createWorkbook(records);
        setPropertiesFromWorkbook(workbook);
        int recOffset = workbook.getNumRecords();

        // convert all LabelRecord records to LabelSSTRecord
        convertLabelRecords(records, recOffset);
        RecordStream rs = new RecordStream(records, recOffset);
        while (rs.hasNext()) {
            try {
                InternalSheet sheet = InternalSheet.createSheet(rs);
                _sheets.add(new HSSFSheet(this, sheet));
            } catch (UnsupportedBOFType eb) {
                // Hopefully there's a supported one after this!
                log.log(POILogger.WARN, "Unsupported BOF found of type " + eb.getType());
            }
        }

        for (int i = 0 ; i < workbook.getNumNames() ; ++i){
            NameRecord nameRecord = workbook.getNameRecord(i);
            HSSFName name = new HSSFName(this, nameRecord, workbook.getNameCommentRecord(nameRecord));
            names.add(name);
        }
    }

 

    /**
     * add a row to the sheet
     *
     * @param addLow whether to add the row to the low level model - false if its already there
     */

    private void addRow(HSSFRow row, boolean addLow) {
        _rows.put(Integer.valueOf(row.getRowNum()), row);
        if (addLow) {
            _sheet.addRow(row.getRowRecord());
        }
        boolean firstRow = _rows.size() == 1;
        if (row.getRowNum() > getLastRowNum() || firstRow) {
            _lastrow = row.getRowNum();
        }
        if (row.getRowNum() < getFirstRowNum() || firstRow) {
            _firstrow = row.getRowNum();
        }
    }

 The storage structure of the excel data row read to the memory is as follows:


3. Solutions

The poi official website gives a method of writing large batches of data. Using the SXXFWorkbook class to perform large batch writing operations solves this problem. If we can monitor this sample, we will find that the overall memory is jagged, which can be recycled in time, and the memory is relatively compared. smooth.

package org.bird.poi;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URL;

import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.util.CellReference;
import org.apache.poi.xssf.streaming.SXSSFWorkbook;
import org.junit.Assert;

public class XSSFWriter {

	private static SXSSFWorkbook wb;

	public static void main(String[] args) throws IOException {
        wb = new SXSSFWorkbook(10000);
        Sheet sh = wb.createSheet();  
        for(int rownum = 0; rownum < 100000; rownum++){  
            Row row = sh.createRow(rownum);  
            for(int cellnum = 0; cellnum < 10; cellnum++){  
                Cell cell = row.createCell(cellnum);  
                String address = new CellReference(cell).formatAsString();  
                cell.setCellValue(address);  
            }  
  
        }  
  
        // Rows with rownum < 900 are flushed and not accessible  
        for(int rownum = 0; rownum < 90000; rownum++){  
          Assert.assertNull(sh.getRow(rownum));  
        }  
  
        // ther last 100 rows are still in memory  
        for(int rownum = 90000; rownum < 100000; rownum++){  
            Assert.assertNotNull(sh.getRow(rownum));  
        }  
        URL url = XSSFWriter.class.getClassLoader().getResource("");
          
        FileOutputStream out = new FileOutputStream(url.getPath() + File.separator + "wirter.xlsx");  
        wb.write(out);  
        out.close();  
  
        // dispose of temporary files backing this workbook on disk  
        wb.dispose();  
	}
}

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326118637&siteId=291194637