Python third party libraries to read and write Excel files summary you want in here!

--- --- restore content begins

Common Library Introduction

xlrd

xlrd is a library to read the data and formatting information from Excel files, support .xls and .xlsx files.
http://xlrd.readthedocs.io/en/latest/
. 1, to xlrd support .xls, .xlsx file read
2, so that by setting the variable on_demand open_workbook () function to load only those required sheet, thereby saving time and memory ( this method is not valid for .xlsx file).
3, xlrd.Book unload_sheet object has a method, it will uninstall the worksheet from memory, specified by the index or worksheet worksheet name (this method .xlsx file is invalid)

xlwt

xlwt is a library for writing the old data and formatting information Excel files (such as .xls).
https://xlwt.readthedocs.io/en/latest/
1, xlwt support .xls file write.

xlutils

xlutils is a library handles Excel files, depending on xlrd and xlwt.
http://xlutils.readthedocs.io/en/latest/
1, xlutils support .xls files.
2, support for Excel operations.

xlwings

xlwings is an achievable call Python from Excel, Excel can call the library in python.
http://docs.xlwings.org/en/stable/index.html
1, xlwings support read .xls, .xlsx file read and write support.
2, support for Excel operations.
3, support for VBA.
4, powerful converter can handle most of the data types, including pandas DataFrame numpy array and in both directions.

openpyxl

openpyxl is a library for reading and writing Excel 2010 xlsx / xlsm / xltx / xltm file. Xiao Bian finishing a set of Python data and PDF, need to learn Python learning materials can be added to the group: 631 441 315, anyway idle is idle it, it is better to learn a lot ~ ~
https://openpyxl.readthedocs.io/en/stable/
1, read and write openpyxl support .xlsx files.
2, support for Excel operations.
3, .xlsx files can be loaded large read_only mode.
4, large .xlsx files can be written to use write_only mode.

xlsxwriter

xlsxwriter is a library for creating Excel .xlsx file.
https://xlsxwriter.readthedocs.io/
1, xlswriter support write .xlsx files.
2, support for VBA.
3, using memory optimization mode when writing large .xlsx files.

win32com

win32com pywin32 in inventory that is a library to read and write Excel files and processes.
http://pythonexcels.com/python-excel-mini-cookbook/
1, win32com supports .xls, .xlsx file read and write, and write support for .xlsx files.
2, support for Excel operations.

Dtnitro

DataNitro is embedded in a Microsoft Excel plug-ins.
https://datanitro.com/docs/
1, DataNitro supports .xls, .xlsx file read and write.
2, support for Excel operations.
3, support for VBA.
4, charges

pandas

pandas by reading and writing Excel files for data input and output
http://pandas.pydata.org/
. 1, pandas support .xls, .xlsx read and write files.
2, support only a single work load pages each table.

Operation and configuration environment can be achieved

Note: DataNitro need to rely on the use of software as a plug-in itself.
Reference: https://zhuanlan.zhihu.com/p/23998083

Reading and writing tests

Computer hardware and system test

Computer model MSI MS-7846 desktop
operating system, Windows 7 Ultimate 64-bit SP1 (DirectX 11)
Processor Intel Pentium (Pentium) G3260 @ 3.30GHz dual-core
motherboard MSI H81M-P32L (MS-7846) ( Intel Haswell - Lynx Point)
memory 4 GB (Kingston DDR3 1600MHz)
main hard drive Western Digital WDC WD5000AZLX-00ZR6A0 (500 GB / 7200 rpm / min)
graphics Intel Haswell Integrated graphics Controller (256 MB / MSI)

Test Case

Example 1. Read the file with .xls entire table (table has five tabs, each tab has 2000 rows of 1200 integer).
Example 2. Read the entire table with .xlsx file (table has five tabs, each tab has 2000 rows of 1200 integer).
Example 3. read the entire table with .xls files (table has a tab, pages 1200 to 2000 there is an integer of rows).
Example 4. Read the entire table with .xlsx file (table has a tab, pages 1200 to 2000 there is an integer of rows).
Example 5. The use .xls files to write the entire table (table has five tabs, each tab has 2000 rows of 1200 integer).
Example 6. write the entire table with .xlsx file (table has five tabs, each tab has 2000 rows of 1200 integer).
Example 7. write the entire table with .xls files (table has a tab, pages 1200 to 2000 there is an integer of rows).
Example 8. write the entire table with .xlsx file (table has a tab, pages 1200 to 2000 there is an integer of rows).

Test Results

注1.xlwt和pandas每个工作页最多写入256列,因此测试用例修改为每页有2000行256列的整数.
注2.xlutils读写依赖于xlrd和xlwt,不单独测试。
注3.openpyxl测试两种模式,一是普通加载写入,二是read_only/write_only模式下的加载写入。
注4.DataNitro要收费,且需依托Excel使用,本次不测试。

Read and write performance comparison

单从读写的性能上考虑,win32com的性能是最好的,xlwings其次。
openpyxl虽然操作Excel的功能强大,但读写性能过于糟糕,尤其是写大表时,会占用大量内存(把我的4G内存用完了),开启read_only和write_only模式后对其性能有大幅提升,尤其是对读的性能提升很大,使其几乎不耗时(0.01秒有点夸张,不过确实是加载上了)。pandas把Excel当作数据读写的容器,为其强大的数据分析服务,因此读写性能表现中规中矩,但其对Excel文件兼容性是最好的,支持读写.xls,.xlsx文件,且支持只读表中单一工作页。同样支持此功能的库还有xlrd,但xlrd只支持读,并不支持写,且性能不突出,需要配合xlutils进行Excel操作,并使用xlwt保存数据,而xlwt只能写入.xls文件(另一个可以写入.xls文件的库是pandas,且这两个写入的Excel文件最多只能有256列,其余库就我目前的了解均只能写入.xlsx文件),性能一般。xlsxwriter功能单一,一般用来创建.xlsx文件,写入性能中庸。win32com拥有最棒的读写性能,但该库存在于pywin32的库中,自身没有完善的文档,使用略吃力。xlwings拥有和win32com不相伯仲的读写性能,强大的转换器可以处理大部分数据类型,包括二维的numpy array和pandas DataFrame,可以轻松搞定数据分析的工作。 综合考虑,xlwings的表现最佳,正如其名,xlwings——Make Excel Fly! 

Ease of comparison

本测试目前只是针对Excel文件的读写,并未涉及Excel操作,单从读写的便捷性来讲,各库的表现难分上下,但是win32com和xlwings这两个库可以在程序运行时实时在打开的Excel文件中进行操作,实现过程的可视化,其次xlwings的数据结构转换器使其可以快速的为Excel文件添加二维数据结构而不需要在Excel文件中重定位数据的行和列,因此从读写的便捷性来比较,仍是xlwings胜出。

Test code

Timing
import timeit  
  
if __name__ == '__main__':  
    # 使用timeit计时  
    t = timeit.Timer('??()', setup='from __main__ import ??')  
    print(t.timeit(number=1))

 

xlrd
import xlrd  
  
def test_xlrd_on_demand_false():  
    # f = xlrd.open_workbook('test_cases\\read_xls.xls', on_demand=False)  
    f = xlrd.open_workbook('test_cases\\read_xlsx.xlsx', on_demand=False)  
  
def test_xlrd_on_demand_true():  
    # f = xlrd.open_workbook('test_cases\\read_xls.xls', on_demand=True)  
    f = xlrd.open_workbook('test_cases\\read_xlsx.xlsx', on_demand=True)  
    f.sheet_by_index(0)

 

xlwt
import xlwt  
  
book = xlwt.Workbook()  
def test_xlwt():  
    for s in range(5):  
        sheet = book.add_sheet(str(s))  
        for i in range(2000):  
            for j in range(256):  
                sheet.write(i, j, 65536)  
    book.save('test_cases\\write_xls.xls')  

 

xlwings
import xlwings  
  
def test_xlwings_read():  
    # f = xlwings.Book('test_cases\\read_xls.xls') 
    f = xlwings.Book('test_cases\\read_xlsx.xlsx')  
  
import numpy as np  
  
f = xlwings.Book('')  
d = np.zeros([2000, 1200])  
d += 65536  
 
def test_xlwings_write():  
    for s in range(1):  
        sheet = f.sheets.add()  
        sheet.range('A1').value = d  
    f.save('test_cases\\write_xlsx.xlsx') 

 

openpyxl
import openpyxl  
  
def test_openpyxl_read():  
    f = openpyxl.load_workbook('test_cases\\read_xlsx.xlsx', read_only=True)  
  
  
c = [65536] * 1200  
f = openpyxl.Workbook(write_only=True)  
  
def test_openpyxl_write():  
    for i in range(1):  
        sheet = f.create_sheet(title=str(i))  
        for row in range(2000):  
            sheet.append(c)  
    f.save('test_cases\\write_xlsx.xlsx')  

 

xlsxwriter
import xlsxwriter  
  
workbook = xlsxwriter.Workbook('test_cases\\write_xlsx.xlsx')  
def test_xlsxwriter():  
    for s in range(1):  
        worksheet = workbook.add_worksheet()  
        for i in range(2000):  
            for j in range(1200):  
                worksheet.write(i, j, 65536)  
    workbook.close() 

 

win32com
import win32com.client as win32  
 
excel = win32.gencache.EnsureDispatch('Excel.Application')  
def test_win32com_read():  
    # wb = excel.Workbooks.Open('E:\\excel\\test_cases\\read_xls.xls')  
    wb = excel.Workbooks.Open('E:\\excel\\test_cases\\read_xlsx.xlsx')
    # excel.Visible = True  
  
  
wb = excel.Workbooks.Add()  
def test_win32com_write():  
    for i in range(1):  
        ws = wb.Worksheets.Add()  
        ws.Range("A1:ATD2000").Value = 65536  
  
    wb.SaveAs('E:\\excel\\test_cases\\write_xlsx.xlsx')  
    excel.Application.Quit()  

 

pandas
import pandas as pd  
  
def test_pandas_read():  
    for i in range(1, 6):  
        sheet_name = "Sheet" + str(i)  
         # df = pd.read_excel('test_cases\\read_xls.xls', sheet_name)
        df = pd.read_excel('test_cases\\read_xlsx.xlsx', sheet_name)  
  
  
import numpy as np  
d = np.zeros([2000, 255])  
d += 65536  
df = pd.DataFrame(d)  
# writer = pd.ExcelWriter('test_cases\\write_xls.xls')
writer = pd.ExcelWriter('test_cases\\write_xlsx.xlsx')  
def test_pandas_write():  
    df.to_excel(writer, 'Sheet1')  
    df.to_excel(writer, 'Sheet2')  
    df.to_excel(writer, 'Sheet3')  
    df.to_excel(writer, 'Sheet4')  
    df.to_excel(writer, 'Sheet5')  
    writer.save() 

 

--- end --- restore content

Guess you like

Origin www.cnblogs.com/qingdeng123/p/11567714.html