Write the same excel file in multiple threads (export)

Today is 2018.03.22, and it has been a long time since I updated the blog. . It's been a busy time and I've gained a lot. I have been working on an excel export task for a long time recently. Thinking about the pit I have stepped on, I want to summarize it.

-------------------------------------------------- ----Separation line ------------------------------------------------------ ------------

 

Frontier: For the Java party, poi is currently the most popular tool for reading and writing Microsoft Office format files. But sometimes when a single thread writes a file that requires a large amount of data or is time-consuming (such as writing pictures to excel),
such as exporting, it is often too slow and times out. This article will take the multi-threaded export of an excel file as an example (the same sheet) to improve efficiency.

 

1. Thread and task
thread pool: Thread pool can be used to manage threads, including task allocation and thread recycling.
Task: The resources to be processed by each thread. When submitting a task, it is necessary to ensure that each task does not repeat each other.
Thread: the executor of the task
(Note: The concepts of task and thread must be distinguished! Each thread can continue to execute tasks that have not been processed after executing a task. The thread pool submits a task does not mean that there will be one The thread executes it immediately)
How to ensure that all tasks are executed before the IO stream can be written?
To ensure that all tasks are executed, you can use a synchronization tool class: CountDownLatch to record. When it is initialized, size is the number of tasks, and countDown is counted each time it is executed. After the code logic of all tasks is executed, await
can prevent any tasks from being executed before the thread is executed, which plays the role of blocking and waiting.

 

2. Process of thinking
First, an excel object is defined as Workbook in poi, and a Workbook contains multiple Sheets. Each Sheet has a Row, and each row has a Cell (cell).
First, put all the data to be written into the Sheet. Distinguish by line, that is, first prepare the data set to be filled (the element is the object of each line, the code is omitted);
establish a thread pool, manage threads and assign tasks:

ExecutorService es = Executors.newFixedThreadPool(40);
 //Counter of thread execution tasks, the initial size is the size of the dataset to be filled
        CountDownLatch latch = new CountDownLatch(results.size());
        for(int i=0; i<results.size(); i++) {
          ExcelResultVo excelResultVo = results.get(i);
//          RowObj rowObj = groups.get(i);
          es.submit(new Runnable() {
            @Override
            public void run() {
              PoiWrite.writeData(excelResultVo, wb, sheet, patriarch, styleContent);
              latch.countDown();//CountDown once after each task is executed
            }
          });

        }
        latch.await();//Block until the value of the counter is 0, then let the main thread execute down
        es.shutdown();//Close the thread pool

 Because the operation is the same sheet, and the bottom layer of the sheet is doing this when addRow:

public void insertRow(RowRecord row) {
		// Integer integer = Integer.valueOf(row.getRowNumber());
		_rowRecords.put(Integer.valueOf(row.getRowNumber()), row);
		// Clear the cached values
		_rowRecordValues ​​= null;
		if ((row.getRowNumber() < _firstrow) || (_firstrow == -1)) {
			_firstrow = row.getRowNumber();
		}
		if ((row.getRowNumber() > _lastrow) || (_lastrow == -1)) {
			_lastrow = row.getRowNumber();
		}
	}

 /_rowRecords is a Map<Integer, RowRecord>, and Map is thread-insecure, so it needs to be locked when addRow:

public static synchronized HSSFRow getRow(HSSFSheet sheet, int rownum) {
      return sheet.createRow(rownum);

  }

 In addition: writing a picture to excel requires a byte array of the picture (the url address of the picture is also time-consuming to convert to a byte array, you can use multi-threading to convert it first, as an attribute of the filled data), here is actually createRow first, and then specify The cell is written with a picture, the general syntax:

HSSFClientAnchor anchor = new HSSFClientAnchor(0, 0,0 , 0, (short) col1, row1, (short) col2, row2);//This is actually createRow
    HSSFPicture pict = patriarch.createPicture(anchor, wb.addPicture(imgData, HSSFWorkbook.PICTURE_TYPE_JPEG));

    pict.resize(1.0,1.0);

 Therefore, the method of writing pictures also needs to be synchronized, otherwise an error will be reported. . (Lessons of Blood)

Tested: It takes more than 5 minutes to export 1000 pieces of data (including 1100 pictures) with a single thread, and it is easy to time out. And multi-threading only takes less than 20s!

Note: Be sure to pay attention to the thread safety issues of multi-threaded operation of the same object, such as multi-threaded operation of the same List, be sure to use a thread-safe List such as CopyOnWriteArrayList or use Collections.synchronizedList to wrap the original ArrayList.
Add to avoid thread safety issues.

 

Thinking : I have always understood the concept of multithreading before, and there is still less practice. Now I have a more intuitive understanding of thread safety: whether different threads execute different resources (tasks) without occurrence of "multiple thread repetition" execute the same resource"
case? For example, in the example of this article, the data to be filled in the same sheet is first divided by Row, each task specifies the position of the Row, and each thread is responsible for filling each row of data to the specified position without repeatedly filling a row. . In this way, multiple threads execute
each "small task" divided by a "big task", which can save time.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326168414&siteId=291194637