An open source framework for .NET to operate Excel with high efficiency and low memory - MiniExcel

文章首发于微信公众号「编程乐趣」,欢迎大家关注。

There are two main ways to operate Excel on the .Net platform. The first is to regard the Excel file as a database, and read and operate it through OleDb; the second is to call the COM component of Excel. Both methods have their own characteristics.

Today I will introduce the third method: the plug-in method. Most of the current mainstream frameworks need to load all the data into the memory for easy operation, but this will cause memory consumption problems. MiniExcel tries to write the underlying algorithm logic from the Stream perspective, which can make the original more than 1,000 MB occupancy is reduced to a few MB to avoid insufficient memory.

MiniExcel is a simple and efficient tool to avoid OOM. NET processing Excel query, write and fill data.

features

  • Low memory consumption, avoid OOM, frequent Full GC situation

  • Support for 即时manipulating each row of data

  • It also has the feature of delayed query with LINQ, and can perform complex queries such as low consumption and fast paging

  • Lightweight, no need to install Microsoft Office, COM+, DLL less than 150KB

  • Easy-to-use API style

performance comparison, test

Import, query Excel comparison

Logic: Use  Test1,000,000x10.xlsx  as the benchmark and mainstream frameworks for performance testing, with a total of 1,000,000 rows*10 columns of "HelloWorld", and a file size of 23 MB.

Export, create Excel comparisons

Logic: Create 10 million "HelloWorld"

Example of use

1. Read/import Excel

1.1 Query Query Excel returns 强型别 IEnumerable data

public class UserAccount
{
    public Guid ID { get; set; }
    public string Name { get; set; }
    public DateTime BoD { get; set; }
    public int Age { get; set; }
    public bool VIP { get; set; }
    public decimal Points { get; set; }
}

var rows = MiniExcel.Query<UserAccount>(path);
// or

using (var stream = File.OpenRead(path))
    var rows = stream.Query<UserAccount>();
 
 

1.2 Query supports Deferred Execution, and can cooperate with LINQ First/Take/Skip to implement low-consumption, high-efficiency complex queries

var row = MiniExcel.Query(path).First();
Assert.Equal("HelloWorld", row.A);

// or

using (var stream = File.OpenRead(path))
{
    var row = stream.Query().First();
    Assert.Equal("HelloWorld", row.A);
}

Efficiency comparison with other frameworks:

1.3 Read large file hard disk cache (Disk-Base Cache - SharedString)

Concept: When MiniExcel judges that the size of the file SharedString exceeds 5MB, it will use the local cache by default, such as 10x100000.xlsx (one million data). The maximum memory usage is about 195MB if the local cache is not enabled, and it will be reduced to 65MB when it is enabled. But pay special attention to this optimization 时间换取内存减少, so the reading efficiency will be slower. In this example, the reading time is increased from 7.4 seconds to 27.2 seconds. If you don’t need it, you can use the following code to turn off the hard disk cache

var config = new OpenXmlConfiguration { EnableSharedStringCache = false };
MiniExcel.Query(path,configuration: config)

You can also use SharedStringCacheSize to adjust the size of the sharedString file to be cached on the hard disk if it exceeds the specified size.

var config = new OpenXmlConfiguration { SharedStringCacheSize=500*1024*1024 };
MiniExcel.Query(path, configuration: config);

2. Write/export Excel

  1. Must be a non-abstract class with a public parameterless constructor

  2. MiniExcel SaveAs support  IEnumerable参数延迟查询, unless necessary, please do not use methods such as ToList to read all data into memory

2.1 Support collection <anonymous type> or <strong type>

var path = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid()}.xlsx");MiniExcel.SaveAs(path, new[] {
   
       new { Column1 = "MiniExcel", Column2 = 1 },    new { Column1 = "Github", Column2 = 2}});

2.2 IDataReader

  • Recommended to avoid loading all data into memory

    Recommend DataReader multi-table export method (recommended to use Dapper ExecuteReader)

using (var cnn = Connection)
{
    cnn.Open();
    var sheets = new Dictionary<string,object>();
    sheets.Add("sheet1", cnn.ExecuteReader("select 1 id"));
    sheets.Add("sheet2", cnn.ExecuteReader("select 2 id"));
    MiniExcel.SaveAs("Demo.xlsx", sheets);
}

3. Template filling Excel

  • The declaration method is similar to Vue template  { {变量名称}}, or collection rendering { {集合名称.栏位名称}}

  • Collection rendering supports IEnumerable/DataTable/DapperRow

3.1 Basic filling

// 1. By POCO
var value = new
{
    Name = "Jack",
    CreateDate = new DateTime(2021, 01, 01),
    VIP = true,
    Points = 123
};
MiniExcel.SaveAsByTemplate(path, templatePath, value);
// 2. By Dictionary
var value = new Dictionary<string, object>()
{
    ["Name"] = "Jack",
    ["CreateDate"] = new DateTime(2021, 01, 01),
    ["VIP"] = true,
    ["Points"] = 123
};
MiniExcel.SaveAsByTemplate(path, templatePath, value);

3.2 Complicated data filling

// 1. By POCO
var value = new
{
    title = "FooCompany",
    managers = new[] {
        new {name="Jack",department="HR"},
        new {name="Loan",department="IT"}
    },
    employees = new[] {
        new {name="Wade",department="HR"},
        new {name="Felix",department="HR"},
        new {name="Eric",department="IT"},
        new {name="Keaton",department="IT"}
    }
};
MiniExcel.SaveAsByTemplate(path, templatePath, value);
// 2. By Dictionary
var value = new Dictionary<string, object>()
{
    ["title"] = "FooCompany",
    ["managers"] = new[] {
        new {name="Jack",department="HR"},
        new {name="Loan",department="IT"}
    },
    ["employees"] = new[] {
        new {name="Wade",department="HR"},
        new {name="Felix",department="HR"},
        new {name="Eric",department="IT"},
        new {name="Keaton",department="IT"}
    }
};
MiniExcel.SaveAsByTemplate(path, templatePath, value);

4, Excel Column Attribute

4.1 Specify the column name, specify the column, whether to ignore the column

public class ExcelAttributeDemo
{
    [ExcelColumnName("Column1")]
    public string Test1 { get; set; }
    [ExcelColumnName("Column2")]
    public string Test2 { get; set; }
    [ExcelIgnore]
    public string Test3 { get; set; }
    [ExcelColumnIndex("I")] // 系统会自动转换"I"为第8列
    public string Test4 { get; set; } 
    public string Test5 { get; } //系统会忽略此列
    public string Test6 { get; private set; } //set非公开,系统会忽略
    [ExcelColumnIndex(3)] // 从0开始索引
    public string Test7 { get; set; }
}
var rows = MiniExcel.Query<ExcelAttributeDemo>(path).ToList();
Assert.Equal("Column1", rows[0].Test1);
Assert.Equal("Column2", rows[0].Test2);
Assert.Null(rows[0].Test3);
Assert.Equal("Test7", rows[0].Test4);
Assert.Null(rows[0].Test5);
Assert.Null(rows[0].Test6);
Assert.Equal("Test4", rows[0].Test7); 

4.2 DynamicColumnAttribute Dynamically set Column

 var config = new OpenXmlConfiguration
            {
                DynamicColumns = new DynamicExcelColumn[] { 
                    new DynamicExcelColumn("id"){Ignore=true},
                    new DynamicExcelColumn("name"){Index=1,Width=10},
                    new DynamicExcelColumn("createdate"){Index=0,Format="yyyy-MM-dd",Width=15},
                    new DynamicExcelColumn("point"){Index=2,Name="Account Point"},
                }
            };
            var path = PathHelper.GetTempPath();
            var value = new[] { new { id = 1, name = "Jack", createdate = new DateTime(2022, 04, 12) ,point = 123.456} };
            MiniExcel.SaveAs(path, value, configuration: config);

Excel category automatic judgment

  • By default, MiniExcel will 文件扩展名judge whether it is xlsx or csv, but there may be inaccuracies, please specify by yourself.

  • The Stream category cannot determine which excel it comes from, please specify it yourself

stream.SaveAs(excelType:ExcelType.CSV);
//or
stream.SaveAs(excelType:ExcelType.XLSX);
//or
stream.Query(excelType:ExcelType.CSV);
//or
stream.Query(excelType:ExcelType.XLSX);

The article was first published on the WeChat public account "programming fun", and everyone is welcome to pay attention. 

Guess you like

Origin blog.csdn.net/daremeself/article/details/125802311