How to improve the efficiency of reading Excel, NPOI multi-threaded reading details

How to Improve the Efficiency of Reading Excel

When the amount of data in Excel is large, reaching hundreds of thousands, using Excel files may become very slow. Here are some suggestions to improve the efficiency of reading Excel:

1. Select the appropriate Excel library

When reading Excel files, choosing the appropriate Excel library can significantly improve efficiency. Some commonly used Excel libraries include:

  • EPPlus : An open source library for reading and writing Excel files
  • NPOI : An open source library for reading and writing Excel files
  • Microsoft.Office.Interop.Excel : Excel library officially provided by Microsoft

2. Using cell ranges

When reading Excel data, do not read line by line, but use cell ranges. This method can reduce the number of times to read Excel files, thereby improving reading efficiency.

Here is an example using a range of cells:

using (var package = new ExcelPackage(new FileInfo(filePath)))
{
    ExcelWorksheet worksheet = package.Workbook.Worksheets[worksheetName];
    var range = worksheet.Cells[2, 1, 50000, 20]; // 选择2-50000行, 1-20列的单元格范围
    var data = range.Value; // 读取单元格范围中的数据
}

3. Use multithreading

Using multiple threads can improve efficiency when reading Excel files. For example, you can use one thread to read Excel files and another thread to process data.

Here is an example using multithreading:

using (var package = new ExcelPackage(new FileInfo(filePath)))
{
    ExcelWorksheet worksheet = package.Workbook.Worksheets[worksheetName];
    var range = worksheet.Cells[2, 1, 50000, 20]; // 选择2-50000行, 1-20列的单元格范围

    // 使用多线程读取单元格范围中的数据
    var thread = new Thread(() => {
        var data = range.Value; // 读取单元格范围中的数据
        // 进行数据处理
    });
    thread.Start();
}

4. Close the Excel application

After reading the Excel file, make sure to close the Excel application. Otherwise, the Excel application may remain in the background, resulting in high system memory usage.

Here is an example of closing the Excel application:

using (var package = new ExcelPackage(new FileInfo(filePath)))
{
    ExcelWorksheet worksheet = package.Workbook.Worksheets[worksheetName];
    var range = worksheet.Cells[2, 1, 50000, 20]; // 选择2-50000行, 1-20列的单元格范围
    var data = range.Value; // 读取单元格范围中的数据

    // 关闭Excel应用程序
    package.Dispose();
}

The above are some suggestions to improve the efficiency of reading Excel. According to the size and data structure of the Excel file, different methods can be selected to improve the reading efficiency.

The following is an example of reading Excel using NPOI multithreading:

using System.Threading.Tasks;
using NPOI.SS.UserModel;
using NPOI.XSSF.UserModel;

// ...

public void ReadExcelWithMultipleThreads(string filePath, string worksheetName)
{
    var workbook = new XSSFWorkbook(filePath);
    var worksheet = workbook.GetSheet(worksheetName);
    var range = new CellRangeAddress(1, worksheet.LastRowNum, 0, worksheet.GetRow(0).LastCellNum - 1); // 获取单元格范围

    // 将单元格范围中的数据分块, 每块1000行
    var chunks = range.Chunks(1000);
    var tasks = new List<Task>();

    foreach (var chunk in chunks)
    {
        var task = Task.Run(() =>
        {
            for (var i = chunk.FirstRow; i <= chunk.LastRow; i++)
            {
                var row = worksheet.GetRow(i);
                if (row == null) continue;

                for (var j = chunk.FirstColumn; j <= chunk.LastColumn; j++)
                {
                    var cell = row.GetCell(j);
                    if (cell == null) continue;

                    var cellValue = cell.ToString();
                    // 处理单元格数据
                }
            }
        });
        tasks.Add(task);
    }

    Task.WaitAll(tasks.ToArray());

    workbook.Close();
}

In this example, CellRangeAddress.Chunks()the method is used to divide the range of cells into chunks of 1000 rows each. Then, the Task.Run()method is used to assign the read operation of each chunk to a thread. Finally, the method is used to Task.WaitAll()close the Excel workbook after all threads have completed.

Note that in this example, only one worksheet is used. If multiple worksheets are to be read at the same time, it will need to be modified as needed.

CellRangeAddress.Chunks()The method can be used in NPOI version 2.5.x and above.

CellRangeAddressThe namespace in NPOI is NPOI.SS.Util.

Thanks for the reminder, indeed there is no method in earlier versions of NPOI CellRangeAddress.Chunks(). If you are using an earlier version, you can consider manually partitioning, or upgrade NPOI to version 2.5.x and above. In NPOI 2.5.x and above, CellRangeAddress.Chunks()the method can be used to divide the cell range into multiple blocks to facilitate efficient reading of Excel files in a multi-threaded environment.

Also, CellRangeAddressthe namespace is NPOI.SS.Util.

The following is the code for manually implementing chunking to achieve Chunks()the effect of the method:

using System.Collections.Generic;
using NPOI.SS.Util;

// ...

public void ReadExcelManually(string filePath, string worksheetName)
{
    var workbook = new XSSFWorkbook(filePath);
    var worksheet = workbook.GetSheet(worksheetName);
    var range = new CellRangeAddress(1, worksheet.LastRowNum, 0, worksheet.GetRow(0).LastCellNum - 1); // 获取单元格范围
    var chunkSize = 1000; // 每个块的大小
    var chunks = new List<CellRangeAddress>();

    // 将单元格范围分成多个块, 每个块包含chunkSize行
    for (var i = range.FirstRow; i <= range.LastRow; i += chunkSize)
    {
        var firstRow = i;
        var lastRow = i + chunkSize - 1;
        if (lastRow > range.LastRow) lastRow = range.LastRow;
        chunks.Add(new CellRangeAddress(firstRow, lastRow, range.FirstColumn, range.LastColumn));
    }

    foreach (var chunk in chunks)
    {
        for (var i = chunk.FirstRow; i <= chunk.LastRow; i++)
        {
            var row = worksheet.GetRow(i);
            if (row == null) continue;

            for (var j = chunk.FirstColumn; j <= chunk.LastColumn; j++)
            {
                var cell = row.GetCell(j);
                if (cell == null) continue;

                var cellValue = cell.ToString();
                // 处理单元格数据
            }
        }
    }

    workbook.Close();
}

In this example, we manually divide the range of cells into chunks of 1000 rows each, then use nested loops to read the cells one by one and perform the necessary operations on each cell. Note that manual chunking may require more code, but it works in earlier versions of NPOI.

However, you may need to make adjustments according to your specific situation. If your data structure is different from the example code, you may need to modify the block size or the index of the loop.

Supongo que te gusta

Origin blog.csdn.net/Documentlv/article/details/130752174
Recomendado
Clasificación