This article mainly introduces what modules to install in Python to process tables. It has certain reference value and friends in need can refer to it. I hope you will gain a lot after reading this article. Let the editor take you to understand it together.
1. Excel document
Workbook: .xlsx file, containing multiple tables (worksheets).
Active table: the table currently viewed by the user and the last table viewed before closing Excel. How to reduce the weight of thesis case .
2. Install the openpyxl module
import openpyxl
3. Read Excel table
First make a sample table 1.xlsx
1. Use the openpyxl module to open the Excel document
The openpyxl module provides the openpyxl.load_workbook() function to open Excel documents.
The openpyxl.load_workbook() function opens an Excel document and returns a value of the workbook data type.
The workbook object represents this Excel file, which is equivalent to the File object representing an open text file.
2. Get the worksheet from the workbook
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx') 打开Excel文档,得到workbook对象
>>> wb.get_sheet_names() workbook对象调用get_sheet_names()方法,得到工作薄中的所有工作表名
['Sheet1', 'Sheet2', 'Sheet3'] 所有工作表名
>>> sheet = wb.get_sheet_by_name('Sheet3') get_sheet_by_name()方法指定工作表名,得到工作表对象,即worksheet对象
>>> sheet
<Worksheet "Sheet3"> Worksheet对象
>>> type(sheet)
<class 'openpyxl.worksheet.worksheet.Worksheet'>
>>> sheet.title worksheet对象调用title方法,得到工作表名
'Sheet3'
>>> anotherSheet = wb.get_active_sheet() worksheet对象调用get_active_sheet()方法得到活动表
>>> anotherSheet
<Worksheet "Sheet1">
openpyxl.load_workbook('example.xlsx'): Open the Excel file and get the workbook object, workbook object.
wb.get_sheet_by_name('Sheet3'): The workbook object calls the get_sheet_by_name() method to get the worksheet object, worksheet object.
3. Get cells from the table
Workbook object - worksheet object - cell object
import openpyxl
wb=openpyxl.load_workbook('1.xlsx') #得到工作薄对象
sheet=wb['Sheet5'] #得到工作表对象
print(type(sheet['A1'])) #得到单元格对象
print(sheet['A1'].value)
c=sheet['B1']
print(c)
print(c.value)
Cell object, cell object, has value, row, column and coordinate attributes.
The worksheet object calls the cell() method, passing in integers as the row and column keyword parameters, and you can also get a cell object.
import openpyxl
wb=openpyxl.load_workbook('1.xlsx') #得到工作薄对象
sheet=wb['Sheet5'] #得到工作表对象
a=sheet.cell(row=1,column=2)
print(type(a))
Using the cell() method and its keyword arguments, you can write a for loop to print out the values of a series of cells.
Exercise: Print the values of all cells in column B.
The get_highest_row() method and get_highest_column() method of the worksheet object can get the size of the table, but in the latest openpyxl module, they have been abolished and max_row and max_column are used instead.
import openpyxl
wb=openpyxl.load_workbook('1.xlsx') #得到工作薄对象
sheet=wb['Sheet5'] #得到工作表对象
r=sheet.max_row
c=sheet.max_column
print(r,c)
for i in range(1,7):
a=sheet.cell(row=i,column=2)
print(a.value)
4. Conversion between column letters and numbers
>>> import openpyxl
>>> from openpyxl.cell import get_column_letter, column_index_from_string
>>> get_column_letter(1)
'A'
>>> get_column_letter(2)
'B'
>>> get_column_letter(27)
'AA'
>>> get_column_letter(900)
'AHP'
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> get_column_letter(sheet.get_highest_column())
'C'
>>> column_index_from_string('A')
1
>>> column_index_from_string('AA')
27
5. Get rows and columns from the table
You can individually obtain a row, a column, or all Cell objects in a rectangular area in the worksheet. You can then loop through all cells in this slice.
import openpyxl
wb=openpyxl.load_workbook('1.xlsx') #得到工作薄对象
sheet=wb['Sheet5'] #得到工作表对象
print(tuple(sheet['A1':'C3']))
Use a loop to output the values of the selected range:
We specified that we need Cell objects in the rectangular area from A1 to C3 , and got a Generator object, which contains the Cell objects in this area . To help us see this Generator object clearly, we can use its tuple() method to list its Cell objects in a tuple .
The Generator object is a tuple, a large tuple, with many small tuples in it, and each row is a tuple.
Therefore, to print out the values of all cells in this area, two for loops must be used. The outer for loop traverses each row in this slice. The second loop, for each row, loops through every cell in that row.
You can also access the value of a specified row or column. At this time, you can use the rots and columns properties of the Worksheet object.
import openpyxl
wb=openpyxl.load_workbook('1.xlsx') #得到工作薄对象
sheet=wb['Sheet5'] #得到工作表对象
a=sheet['A1':'C3']
b=[col for col in sheet.columns][1]
print(b)
Using the rows property of the Worksheet object , you can get a tuple of tuples. Each tuple inside represents a row and contains the Cell objects in that row . The columns property will also give you a tuple of tuples, each containing a Cell object in one column . For example.xlsx , since there are 7 rows and 3 columns, rows gives a tuple of 7 tuples (each inner tuple contains 3 Cell objects). columns gives a tuple of 3 tuples (each inner tuple contains 7 Cell objects ). To access a specific tuple, use its subscript in the larger tuple. For example, to get the tuple representing column B, you can use sheet.columns[1] . To get a tuple representing column A , use sheet.columns[0] . After you have a tuple representing a row or column, you can loop through its objects and print out their values.
6. Workbook, worksheet, cell
4. Project: Reading data from a spreadsheet
5. Write to Excel table
1. Create and save Excel document
I found that the book I am currently studying, Chapter 13 of Quick Start with Python Programming, mainly uses the openpyxl module to process Excel spreadsheets. However, the tutorials on this module in the book are outdated, and many functions have been abandoned or replaced, so I I decided to find a tutorial on the Internet again to learn the openpyxl module.
openpyxl
1. Introduction
Official documentation: https://openpyxl.readthedocs.io/en/stable/
Note: The openpyxl module only supports xlsx/xlsm/xltx/xltm format, and does not support xls format.
2. New
1. Create a new workbook
from openpyxl import Workbook: Import the Workbook class from the openpyxl module
wb = Workbook() instantiates the object wb from the Workbook class
ws = wb.active The wb object calls the active method to get the active table
2. Create a new worksheet
You can use the create_sheet() function to create a new worksheet.
#新建工作表,名称以Sheet1,Sheet2,....自动填充
ws_1 = wb.create_sheet() # 默认在结尾处新建一个新的工作表
ws_2 = wb.create_sheet(0) # 在当前工作表的指定索引处新建一个工作表
# 用title指定工作表名称
ws_1.title = "新建工作表"
# 新建工作表,并指定名称
ws_3 = wb.create_sheet(title="新建工作表-2",index=0)
ws_4 = wb.create_sheet("新建工作表-1", 0)
# 改变工作表标签颜色,默认为无颜色
ws.sheet_properties.tabColor = "F22F27"
3. Operation
1. Set the workbook to read-only
wb = load_workbook(filename='数据.xlsx', read_only=True) #设置只读
2. Worksheet operations
# 导入openpyxl模块的Workbook类
import openpyxl
from openpyxl import load_workbook
wb=load_workbook(filename='1.xlsx',read_only=True)
print(wb.sheetnames) # 获取工作薄中所有工作表名称
for sheet in wb:
print(sheet.title) # 遍历工作薄中所有工作表名
#获取指定工作表名称
ws=wb['Sheet1'] #创建工作表对象
print('***********'+ws.title) #输入工作表名称
ws_copy=wb.copy_worksheet(ws) #复制工作表
#删除工作表
# 方式一
ws = wb["Sheet1"]
wb.remove(ws)
# 方式二
del wb["Sheet1"]
3. Row and column operations
Get the cell range
Insert blank rows and columns
Delete rows and columns
4. Access cells
Note: When a worksheet is created, it does not contain cells; it is created only when the cells are retrieved.
This way we don't create cells that we never use, thus reducing memory consumption.
Access a single cell
cell_A2=ws['A2']
cell_C3=ws[row=3,column=3]
Access multiple cells
Access via slices:
cell_area = ws['A1':'B4']
cell_exact = ws.iter_rows(min_row=1, max_row=3, min_col=1, max_col=2) #即A1:B3
Access via rows and columns:
col_A = ws['A'] #Column A
col_area = ws['A:B'] #Column A and B row_2
= ws[2] #Row 2
row_area = ws[2:5] #Rows 2-5
# Iterate over all rows
all_by_row = ws.rows
# Iterate over all columns
all_by_col = ws.columns
The results can be processed using tuple(), list(), and loops
e.g. 1.xlsx
import openpyxl
from openpyxl import load_workbook
wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
#获取单个单元格的值
cell_A2=ws['A2']
cell_C3=ws.cell(row=3,column=3)
print(cell_A2.value,cell_C3.value)
print('---------------------------')
#获取多个单元格的值
cell_area = ws['A1':'D4']
for row in cell_area:
for cell in row:
print(cell.value)
5. Set row height and column width
Set the entire row height
# 设置第2行行高
ws.row_dimensions[2].height = 40
Set the entire column width
# 设置C列列宽
ws.column_dimensions['C'].width = 30
Finally, be sure to remember to save, wb.save('file name'), otherwise it will not be applied.
e.g. 1.xlsx
import openpyxl
from openpyxl import load_workbook
wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
# 设置第2行行高
ws.row_dimensions[2].height = 40
# 设置C列列宽
ws.column_dimensions['C'].width = 30
wb.save('1.xlsx')
The original:
After modification:
6. Merge cells
merge
Note: If you want to write data in merged cells, you only need to write data in the cell in the upper left corner of the merged area.
If the cells in the merged area have data, only the data in the upper left corner will be retained.
ws.merge_cells('A2:D4')
ws.merge_cells(start_row=2, start_column=1, end_row=4, end_column=4)
import openpyxl
from openpyxl import load_workbook
wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
ws.merge_cells('A5:B6')
ws['A5']='这是一个合并后的单元格'
wb.save('1.xlsx')
Unmerge
ws.unmerge_cells('A2:D4')
ws.unmerge_cells(start_row=2, start_column=1, end_row=4, end_column=4)
import openpyxl
from openpyxl import load_workbook
wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
ws.merge_cells('A5:B6')
ws['A5']='这是一个合并后的单元格'
ws.unmerge_cells('A5:B6')
wb.save('1.xlsx')
4. Write
1. Write data
# 在单元格写入数据
ws['A1'] = 42 #A1单元格写入
ws.cell(row=1, column=2, value=42) #B1单元格写入
ws.cell(1,3).value= 42 #C1单元格写入
# 新增一行数据
ws.append([1, 2, 3, 4])
for example:
import openpyxl
from openpyxl import load_workbook
wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
ws['A1']='姓名'
ws.cell(row=5, column=1, value='韩梅梅') #A5单元格写入
ws.cell(5,2).value= 42 #B5单元格写入
ws.append(['离美', 2, 3, 4]) # 新增一行数据
wb.save('1.xlsx')
2. Write the formula
#写入公式
ws['B2'] = "=SUM(A2:A4)"
ws.cell(row=2, column=2, value = "=SUM(A2:A4)")
ws.cell(2,2).value = "=SUM(A2:A4)"
3. Insert pictures
img = Image('image') # image:要插入的图片
ws.add_image(img, 'B1') #在B1单元格插入图片
5. Set cell style
Cell styles include: number_format (data format), Font (font), Fill (fill), Border (border), Alignment (alignment), Protection (protection).
1. Digital format
import openpyxl
from openpyxl import Workbook
import datetime
wb = Workbook()
ws = wb.active
ws['A1'] = '文字'
print(ws['A1'].number_format) #-->>> General
ws['A2'] = 5
print(ws['A2'].number_format) #-->>> General
ws['A3'] = 0.05
ws['A3'].number_format='0.00%' # 自定义格式
print(ws['A3'].number_format) # -->>> 0.00%
ws['B1'] = datetime.datetime.now()
print(ws['B1'].number_format) # -->>> yyyy-mm-dd h:mm:ss
ws['B2'] = datetime.datetime.now()
ws['B2'].number_format='yyyy-mm-dd' # 自定义格式
print(ws['B2'].number_format) #-->>> yyyy-mm-dd
wb.save("2.xlsx")
2. Font
import
# 导入Font
from openpyxl.styles import Font
Parameter Description
name #字体
size #字号,默认11。
bold #是否加粗,默认False。加粗:True
italic #是否斜体,默认False。斜体:True
vertAlign #上下标,默认None。正常:baseline,上标:super,下标:sub
color #字体颜色,默认黑色(FF000000)。
strikethrough #删除线,默认不设置。设置:True
underline #下划线,默认不带下划线。单下划线:single,双下划线:double,会计用单下划线:singleAccounting,会计用双下划线:doubleAccounting
Example
import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font
import datetime
wb = Workbook()
ws = wb.active
ws['A1'] = '默认' #A1单元格写入
ws['B2'] = '设置格式' #B2单元格写入
ws['C3'] = '设置上标' #C3单元格写入
#设置B2单元格字体格式
ws['B2'].font = Font(name='Calibri', size=12, color="00FF9900", italic=True, underline='double', strikethrough=True)
#设置C3单元格字体格式
ws['C3'].font = Font(vertAlign='super', bold=True)
#保存
wb.save("2.xlsx")
3. Filling
solid color fill
import
#导入
from openpyxl.styles import PatternFill
Parameter Description
fill_type #设置图案样式,如果不设置则不会显示颜色。
'''
fill_type可设置的值:solid(实心),lightHorizontal,
darkTrellis, darkUp, darkGray, darkVertical, lightDown,
lightTrellis, lightUp, darkDown, darkHorizontal, mediumGray,
lightVertical, gray0625, gray125, lightGrid, darkGrid, lightGray
'''
fgColor/start_color #前景色 ,即填充色
bgColor/end_color #背景色,即图案颜色
Example
from openpyxl import Workbook
from openpyxl.styles import PatternFill
wb = Workbook()
ws = wb.active
ws['A1'] = '默认' #A1单元格写入
ws['B2'] = '前景色' #B2单元格写入
ws['C3'] = '背景色' #C3单元格写入
#前景色,即填充色。也是我们一般设置的填充色
ws['B2'].fill = PatternFill(fill_type='solid', fgColor='00FF9900')
#背景色,即图案颜色。
ws['C3'].fill = PatternFill(fill_type='solid', bgColor='00FF9900')
wb.save("实例.xlsx")
2. Gradient Fill (GradientFill)
import
#导入
from openpyxl.styles import GradientFill
Parameter Description
type/fill_type # 渐变填充类型:linear,path
'''
linear :
渐变在一组指定的 Stops 之间插入颜色,跨越一个区域的长度。默认情况下渐变是从左到右的,但可以使用 degree 属性修改此方向。可以改为提供颜色列表,它们之间的距离将相等。
path:
渐变从区域的每个边缘应用线性渐变。属性 top、right、bottom、left 指定从各个边界填充的范围。比如top=”0.2” 将填充单元格的前 20%。
'''
Example
from openpyxl.styles import GradientFill
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
# 合并单元格
ws.merge_cells('B2:F4')
#对合并单元格左上角单元格设置渐变填充
top_left_cell = ws['B2']
top_left_cell.fill = GradientFill(type='linear', degree=0, stop=('FFFFFF', '99ccff', '000000')) #渐变填充
wb.save("实例.xlsx")
4. Border
When setting the border style, you also need to pass Side.
import
#导入
from openpyxl.styles import Border, Side
Parameter Description
# Border参数说明
left = Side(style , color) #左边框设置
right = Side(style , color) #右边框设置
top = Side(style , color) #上边框设置
bottom = Side(style , color) #下边框设置
diagonalDown #是否显示左上-右下对角线,显示:True
diagonalUp #是否显示左下-右上对角线,显示:True
diagonal = Side(style , color) #对角线边框设置,注意首先要设置显示对角线
#Side参数说明
style/border_style #边框样式
'''
边框样式可设置的有:
thick, mediumDashDot, dashed, mediumDashDotDot,
dashDot, slantDashDot, dotted, double, thin,
hair, dashDotDot, mediumDashed, medium
'''
color #边框颜色
Example
from openpyxl import Workbook
from openpyxl.styles import Border, Side
wb = Workbook()
ws = wb.active
ws['A1'] = '默认' #A1单元格写入
ws['B2'] = '边框' #B2单元格写入
ws['C3'] = '对角线' #C3单元格写入
#边框线格式设置
line_format = Side(style='medium',color='00FF9900')
#B2单元格设置上下左右边框
ws['B2'].border = Border(left=line_format, right=line_format, top=line_format, bottom=line_format)
#C3单元格设置对角线
ws['C3'].border = Border(diagonalDown=True, diagonalUp=True, diagonal=line_format)
#保存
wb.save("实例.xlsx")