Probably the most complete Python operation Excel library summary on the entire network!

Preface

The text and pictures in this article are from the Internet and are for learning and communication purposes only, and do not have any commercial use. If you have any questions, please contact us for processing.

PS: If you need Python learning materials, you can click on the link below to get it by yourself

Python free learning materials, codes and exchange answers click to join


In order to further understand the similarities and differences of each library, so that it can be used flexibly in different scenarios, this article will horizontally compare 7 commonly used modules that can operate Excel files, and consolidate learning while comparing the common operations of each module!

First, let us grasp the characteristics of different libraries as a whole
. 1. Each function of xlrd, xlwt, and xlutils has its limitations, but the three complement each other and cover the operation of Excel files, especially .xls files. xlwt can generate .xls files, xlrd can read existing .xls files, xlutils connects xlrd and xlwt modules so that users can read and write an .xls file at the same time. Simply put, xlrd is responsible for reading, xlwt is responsible for writing, and xlutils is responsible for providing assistance and connection.
2. xlwings can easily read and write data in Excel files, and can modify the cell format
3. XlsxWriter is a tool for writing. xlsx file format module. It can be used to write text, numbers, formulas and supports cell formatting, pictures, charts, document configuration, automatic filtering and other features. But it cannot be used to read and modify Excel files.
4.openpyxl can read, write and modify .xlsx files through the workbook "workbook-worksheet-cell" mode, and can adjust the style.
5. pandas Unfamiliar is a powerful module for data processing and analysis, and sometimes it can also be used to automate Excel

If you are too lazy to look at the detailed comparison process, you can directly look at the final summary picture, and then pull it to the end of the article to collect and like it, even if you have learned it

 

1. Installation

The 7 modules are all non-standard libraries, so they all need to be installed with pip on the command line:

pip install xlrd
pip install xlwt
pip install xlutils
pip install xlwings
pip install XlsxWriter
pip install openpyxl
pip install pandas

Two, module import

Most modules can be imported directly by name, and some modules use abbreviations for common names:

import xlrd
import xlwt
import xlwings as xw
import xlsxwriter
import openpyxl
import pandas as pd

The xlutils module is the bridge between xlrd and xlwt. The core function is to copy a .xls object read into memory through xlrd, and then copy the content of the .xls table through xlwt on the object. xlutils can copy and convert xlrd's Book object into xlwt's Workbook object. In specific use, the copy submodule in the module is usually imported:

import xlutils.copy

Three, read the Excel file

3.1 Obtaining files

Not all 7 modules can read Excel files, and even if they can read Excel files, they must be discussed with different suffixes, as follows:
1. xlwt, xlutils, XlsxWriter cannot read files
2. xlrd can read files . xls and .xlsx files
3. xlwings can read .xls and .xlsx files
4. openpyxl can read .xlsx files
5. pandas can read .xls and .xlsx files

The following test uses two .xls and .xlsx files each of 10MB in size:

xls_path = r'C:\xxx\Desktop\test.xls'
xlsx_path = r'C:\xxx\Desktop\test.xlsx'

3.1.1 xlrd read file

xlrd can read .xls and .xlsx files

xls = xlrd.open_workbook(xls_path)
xlsx = xlrd.open_workbook(xlsx_path)

3.1.2 xlwings read file

xlwings directly connects to apps, that is, Excel applications, and then workbooks and worksheets. xlwings requires an environment where the Excel application is installed. xlwings can read .xls and .xlsx files

app = xw.App(visible=True, add_book=False) # 程序可见,只打开不新建工作薄
app.display_alerts = False # 警告关闭
app.screen_updating = False # 屏幕更新关闭
# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() # 保存文件
wb.close() # 关闭文件
app.quit() # 关闭程序

3.1.3 openpyxl read file

openpyxl can read .xlsx files

wb = openpyxl.load_workbook(xlsx_path)

If you read the .xls file, an error will be reported:

wb = openpyxl.load_workbook(xls_path)

openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.

3.1.4 pandas read file

pandas can read .xls and .xlsx files

xls = pd.read_excel(xls_path, sheet_name='Sheet1')
xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

Next, compare the time taken by the four modules to read the 10MB .xlsx file under the same configuration computer (run 3 times to average), the code used is:

import time
import xxx

time_start = time.time()
xxx
time_end = time.time()
print('time cost: ', time_end-time_start, 's')

The result of the final test is that xlwings reads 10MB files the fastest, xlrd is the second, and openpyxl is the slowest (depending on the computer, the results are for reference only)

The table that reads the part of the Excel file is summarized as follows:

 

3.2 Get the worksheet

For the above 4 modules that can read Excel files, we will further discuss how to obtain the worksheet.

3.2.1 xlrd get worksheet

You can search by sheet name:

sheet = xlsx.sheet_by_name("Sheet1")

You can also search by index:

sheet = xlsx.sheet_by_index(0)

3.2.2 xlwings get worksheet

The worksheets of xlwings are divided into active worksheets and specific worksheets under the specified workbook:

sheet = xw.sheets.active  # 在活动工作簿
sheet = wb.sheets.active  # 在特定工作簿

3.2.3 openpyxl get worksheet

The .active method gets the first worksheet of the workbook by default

sheet = wb.active

In addition, the worksheet can also be obtained by specifying the worksheet name:

sheet = wb['Sheet1']

3.2.4 pandas get worksheet

Obtaining the worksheet alone is nothing pandas at all, because the worksheet is already and must be specified when reading the file to read:

xlsx = pd.read_excel(xlsx_path, sheet_name='Sheet1')

Fourth, create an Excel file

A brief summary of the creation of Excel files:
1. xlrd, xlutils cannot create Excel files
2. xlwt can only create .xls files, not .xlsx files
3. xlwings can create .xls and .xlsx files
4. XlsxWriter can create .xlsx File
5.openpyxl can create .xls and .xlsx files
6. pandas does not have the concept of creating Excel, but can generate .xls or .xlsx files when stored

4.1 xlwt create file

xlwt can only create .xls files, not .xlsx files

xls = xlwt.Workbook(encoding= 'ascii')
# 创建新的sheet表
worksheet = xls.add_sheet("Sheet1")

4.2 xlwings create file

xlwings can create .xls and .xlsx files, just write the suffix clearly when saving. Use the following command:

wb = app.books.add()

Whether to create or open, you need to save the workbook, close the workbook, and close the program, namely:

wb.save(path + r'\new_practice.xlsx') 
wb.close() 
app.quit() 

4.3. XlsxWriter creates files

XlsxWriter can create .xlsx files:

xlsx = xlsxwriter.Workbook()   
# 添加工作表
sheet = xlsx .add_worksheet('Sheet1')

4.4 openpyxl create file

openpyxl can create .xls and .xlsx files, just write the suffix clearly when saving. Use the following command:

wb = Workbook()
# 新工作簿中指定即创建工作表
sheet = wb.active

4.5. pandas create file

pandas only needs to write the suffix clearly when dumping. In fact, it is quite abstract. Pandas does not need to create an Excel file at the beginning. You can do various operations around the data frame and then use the .to_excel command and then use .xls or .xlsx as the file suffix. If you must generate a blank Excel file, you can use the following command:

df = pd.DataFrame([])
df.to_excel(r'C:\xxx\test1.xlsx')

Five, save the file

A brief summary of the situation of saving Excel files:
1.xlrd cannot save Excel files
2.xlwt can save .xls files
3.xlutils can copy xlrd objects into xlwt objects and save .xls files
4.xlwings can save .xls and .xlsx files
5. XlsxWriter can save .xlsx files
6. openpyxl can save .xlsx files
7. pandas can save .xls or .xlsx files

5.1 xlwt save file

xlwt can save .xls files

# xls = xlwt.Workbook(encoding= 'ascii')
# worksheet = xls.add_sheet("Sheet1")
xls.save("new_table.xls")

5.2 xlutils save files

xlutils can copy xlrd objects into xlwt objects and save .xls files

# xls_path = r'C:\xxxx\test.xls'
# xls = xlrd.open_workbook(xls_path)
xls_xlutils = xlutils.copy.copy(xls)
xls_xlutils.save('new_text.xls')

5.3 xlwings save files

xlwings can save .xls and .xlsx files

# wb = app.books.open(xls_path)
wb = app.books.open(xlsx_path)
wb.save() # 保存文件
wb.close() # 关闭文件
app.quit() # 关闭程序

5.4 XlsxWriter saves files

XlsxWriter can save .xlsx files. After the .close command is executed, the files are closed and saved at the same time:

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')
xlsx.close()

5.5 openoyxl save files

openpyxl can save .xlsx files

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active
wb.save('new_test.xlsx')

6. pandas save files

pandas can save .xls or .xlsx files

df1 = pd.DataFrame([1, 2, 3])
df2 = pd.DataFrame([1, 2, 4])
df1.to_excel(r'C:\xxxx\test1.xls')
df2.to_excel(r'C:\xxxx\test2.xlsx')

Six, get the value of the cell

The basic premise of obtaining the value of a cell is to be able to read the file, so it is basically introduced around xlrd, xlwings, openpyxl, and pandas. Since xlutils can make a copy of .xls, it can also use exactly the same cell reading method as xlrd.

6.1. xlrd/xlutils get cell

Because xlutils directly copies an object applicable to xlrd, the method used to read cells is exactly the same as xlrd. xlwt does not have the ability to read cells

# xls = xlrd.open_workbook(xls_path)
# sheet = xlsx.sheet_by_name("Sheet1")
value = sheet.cell_value(4, 6) # 第5行第7列的单元格
print(value)
rows = table.row_values(4)
cols = table.col_values(6)
for cell in rows:
    print(cell)

6.2. xlwings get cell

# app = xw.App(visible=True, add_book=False) 
# app.display_alerts = False 
# app.screen_updating = False 
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active

# 获取单个单元格的值
A1 = sheet.range('A1').value
print(A1)
# 获取横向或纵向多个单元格的值,返回列表
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
# 获取给定范围内多个单元格的值,返回嵌套列表,按行为列表
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)
# 获取单个单元格的值
A1 = sheet.range('A1').value
print(A1)
# 获取横向或纵向多个单元格的值,返回列表
A1_A3 = sheet.range('A1:A3').value
print(A1_A3)
# 获取给定范围内多个单元格的值,返回嵌套列表,按行为列表
A1_C4 = sheet.range('A1:C4').value
print(A1_C4)

6.3 openpyxl get cell

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active

# 一、指定坐标范围的值
cells = sheet['A1:B5']
# 二、指定列的值
cells = sheet['A']
cells = sheet['A:C']
# 三、指定行的值
cells = sheet[5]
cells = sheet[5:7]
# 获取单元格的值
for cell in cells:
    print(cell.value)

6.4 pandas gets the value of a cell

After pandas reads the Excel file, it converts it into a data frame object. The method of parsing the content is basically the knowledge points in the pandas system, such as .iloc() .loc() .ix(), etc.:

print(df1.iloc[0:1, [1]])
print(df1.loc['b'])
print(df2.ix['a', 'a']) # 有些版本取消了ix,可以用iat

Seven, write data

Let's briefly summarize the situation of writing data to Excel files:
1.xlrd cannot write data
2.xlwt can write data
3.xlutils can borrow xlwt method to write data
4.xlwings can write data
5.XlsxWriter can write data Data
6.openpyxl can write data.
7. After pandas reads the Excel file as a data frame, it abstracts the operation at the data frame level, without the concept of writing and modifying Excel cells

7.1. xlwt/xlutils write data

# xls = xlrd.open_workbook(xls_path)
# xls_xlutils = xlutils.copy.copy(xls)
# sheet = xls_xlutils.sheet_by_name("Sheet1")
# value = sheet.cell_value(4, 6)
# print(value)
sheet.write(4, 6, "新内容")

7.2 xlwings write data

# app = xw.App(visible=True, add_book=False) 
# app.display_alerts = False 
# app.screen_updating = False 
# wb = app.books.open(xls_path)
# sheet = wb.sheets.active

# 写入 1 个单元格
sheet.range('A2').value = '大明'
# 一行或一列写入多个单元格
# 横向写入A1:C1
sheet.range('A1').value = [1,2,3]
# 纵向写入A1:A3
sheet.range('A1').options(transpose=True).value = [1,2,3]
# 写入范围内多个单元格
sheet.range('A1').options(expand='table').value = [[1,2,3], [4,5,6]]

7.3 XlsxWriter write data

The new_format in the code is a preset style, which will be introduced below

# xlsx = xlsxwriter.Workbook()
# sheet = xlsx .add_worksheet('Sheet1')

# 一、写入单个单元格
sheet.write(row, col, data, new_format)
# A1:从A1单元格开始插入数据,按行插入
sheet.write_row('A1', data, new_format)
# A1:从A1单元格开始插入数据,按列插入
sheet.write_column('A1', data, new_format)

7.4. openpyxl write data

# wb = openpyxl.load_workbook(xlsx_path)
# wb = Workbook()
# sheet = wb.active

# 一、写入单元格
cell = sheet['A1']
cell.value = '业务需求'
# 二、写入一行或多行数据
data1 = [1, 2, 3]
sheet.append(data1)
data2 = [[1, 2, 3], [4, 5, 6]]
sheet.append(data2)

Eight, style adjustment

Still briefly summarize the adjustment of the Excel file style:
1. xlrd, xlutils can not adjust the style (also can say that xlutils can, but borrowed the method of
xlwt ) 2. xlwt can adjust the style
3. xlwings can adjust the style
4. XlsxWriter Can adjust the style
5.openpyxl Can adjust the style
6.pandas Can not adjust the style

8.1 xlwt adjustment style

xlwt supports adjusting fonts, borders, colors and other styles

# 字体部分
# 初始化样式
style1 = xlwt.XFStyle()
# 为样式创建字体
font = xlwt.Font()
font.name = 'Times New Roman'   #字体
font.bold = True                #加粗
font.underline = True           #下划线
font.italic = True              #斜体
# 设置样式
style1.font = font
# 使用样式
sheet.write(4, 6, "新内容1", style1)

# 边框部分
borders = xlwt.Borders()
# 设置线型
borders.left = xlwt.Borders.DASHED
borders.right = xlwt.Borders.DASHED
borders.top = xlwt.Borders.DASHED
borders.bottom = xlwt.Borders.DASHED
# 设置样色
borders.left_colour = 0x40
borders.right_colour = 0x40
borders.top_colour = 0x40
borders.bottom_colour = 0x40
# 
style2 = xlwt.XFStyle()
style2.borders = borders
# 使用样式
sheet.write(5, 8, "新内容2", style2)

8.2 xlwings adjustment style

A brief introduction to xlwings' color adjustment:

# 获取颜色
print(sheet.range('C1').color)
# 设置颜色
sheet.range('C1').color = (255, 0, 120)
# 清除颜色
sheet.range('C1').color = None

8.3 XlsxWriter adjustment style

XlsxWriter contains a large number of functions, you can make high-definition and customized style modifications to the worksheet after creating the worksheet:

new_format = xlsx.add_format({
        'bold':  True,  # 字体加粗
        'border': 1,  # 单元格边框宽度
        'align': 'left',  # 水平对齐方式
        'valign': 'vcenter',  # 垂直对齐方式
        'fg_color': '#F4B084',  # 单元格背景颜色
        'text_wrap': True  # 是否自动换行
    })

sheet.write(row, col, data, new_format)

8.4 openpyxl adjustment style

The openpyxl styles mainly include fonts, borders, paragraph alignment styles, etc.

# 字体样式
from openpyxl.styles import Font 
cell = sheet['A1'] 
font = Font(name='Arial', size=12, bold=True, italic=True, color='FF0000') 
cell.font = font

# 段落对齐
from openpyxl.styles import Alignment 
cell = sheet['B2'] 
alignment = Alignment(horizontal='center', vertical='center',                     text_rotation=45, wrap_text=True) 
cell.alignment = alignment 

# 边框样式
from openpyxl.styles import Side, Border 
cell = sheet['B2'] 
side1 = Side(style='thin', color='FF0000') 
side2 = Side(style='dashed') 
border = Border(left=side1, right=side1, top=side2, bottom=side2) 
cell.border = border

Nine, insert a picture

Briefly summarize the situation of inserting pictures into Excel files:
1. xlrd and xlutils cannot adjust the style (also can say that xlutils can, but it is a method of borrowing
xlwt ) 2. xlwt can insert .bmp pictures
3. xlwings can insert pictures
4. XlsxWriter can insert pictures
5. openpyxl can insert pictures
6. pandas cannot insert pictures

9.1 xlwt insert picture

xlwt Inserting a picture requires that the picture format must be in .bmp format to insert successfully

sheet.insert_bitmap("test.bmp", 2, 3, 2, 2, 0.5, 0.5)

insert_bitmap(img, x, y, x1, y1, scale_x, scale_y) img represents the address of the image to be inserted, x represents the row, y represents the column x1 y1 represents the pixel scale_x and scale_y that are offset from the original position downward to the right The ratio of image width to height, the image can be zoomed in and out

9.2 xlwings insert picture

Below is the code to insert the picture with xlwings, you can specify the position

sheet.pictures.add(r'C:\\xxx.jpg')
# 也可以给定位置插入
sheet.pictures.add(r'C:\\xxx.jpg', left=sheet.range('A2').left, top=sheet.range('A2').top, width=100, height=100)

9.3 XlsxWriter insert picture

The first parameter is the starting cell to insert, and the second parameter is the absolute path of the image file

sheet.insert_image('A1', r'C:\\xxx.jpg')

9.4 openpyxl insert picture

openpyxl can also insert a specified picture into Excel and modify the size

from openpyxl.drawing.image import Image
img = Image('test.jpg')
newsize = (180, 360) 
img.width, img.height = newsize # 设置图片的宽和高
sheet.add_image(img, 'A2') # 往A2单元格插入图片

summary

The above is all the content of the comparison of common Excel operations based on different Python modules. The final results are summarized as shown in the table below

 

 

Please note that the purpose of this article is not to judge the best library, but to compare different libraries from different perspectives, hoping to let everyone know what each library is good at. For example, although pandas is convenient to handle, it cannot add image modification styles. Although openpyxl supports various operations, the speed is relatively slow.

Only by fully understanding the characteristics of different tools can we flexibly use different methods to solve problems efficiently in different scenarios!

Guess you like

Origin blog.csdn.net/pythonxuexi123/article/details/114696590