Text | Sun Snow
Source: Python technology "ID: pythonall"
Whether it is daily office or programming, Excel is always indispensable for importing and exporting data, recording data, statistical analysis, drawing prototypes, and even a grandfather in Japan uses Excel to create drawings
Although Excel is powerful and convenient to operate, it is still not convenient in some scenarios, such as importing a large amount of data into Excel, reading the data from Excel into the system, or formatting the original data according to a certain structure and batch processing A lot of Excel documents, etc. Fortunately, there are many Python libraries that can help us control Excel with programs to complete tasks that are difficult to complete manually. Now let ’s understand
Excel library under Python
There are a large number of native and third-party Excel operation packages in Python, each with their own strengths, but for students who have just used Python to interact with Excel, it may be a bit overwhelming, so first briefly sort out some common Excel packages
OpenPyXL is a Python library that reads and writes Excel 2010 xlsx / xlsm / xltx / xltm. It is simple and easy to use. It has a wide range of functions, including cell format / picture / table / formula / filter / comment / document protection and so on. Chart function is one of them Highlights
xlwings is a Python library based on the BSD license agreement. You can easily use Python to operate Excel, and you can also call Python in Excel to achieve Excel programming close to VBA syntax. It supports Excel macros and can be used as a Web server to provide REST API interfaces
Pandas data processing is the foundation of pandas, Excel is used as a container for pandas input / output data
Win32com can be seen from the naming, this is an extension to deal with windows applications, Excel is only a small part of the functions that the library can achieve. The library also supports many operations of office. It should be noted that the library does not exist alone, and can be obtained by installing pypiwin32 or pywin32
Xlsxwriter has rich features, supports pictures / tables / graphs / filters / formats / formulas, etc. The function is similar to openpyxl, the advantage is that compared to openpyxl also supports VBA file import, sparklines and other functions, the disadvantage is that you cannot open / modify existing files , Meaning to use xlsxwriter from scratch
DataNitro is a paid add-in for Excel, embedded in Excel, which can completely replace VBA and use python scripts in Excel. Since it is called python in Excel, it can also work with other python libraries.
Xlutils is based on xlrd / xlwt, an old-style python package, and is considered a pioneer in this field. Its functional characteristics are quite satisfactory. The relatively big disadvantage is that it only supports xls files.
To summarize:
Do not want to use the GUI and want to give Excel more functions, openpyxl and xlsxwriter, you can choose one of the two;
Need to carry out scientific calculations and process large amounts of data. It is recommended to use pandas + xlsxwriter or pandas + openpyxl, which is a good choice;
If you want to write Excel scripts, you will know Python but not VBA. Consider xlwings or DataNitro;
The function of win32com is still very powerful, but it requires some windows programming experience to get started. It is equivalent to the encapsulation of windows COM, and the documentation is not perfect.
OpenPyXL
OpenPyXl can realize almost all Excel functions, and the interface is clear, the documentation is rich, and the learning cost is relatively low. Today, we will take OpenPyXL as an example to understand how to operate Excel.
installation
Install with pip
pip install openpyxl
After the installation is successful, you can run the following test:
python -c "import openpyxl"
basic concept
Workbook is equivalent to an Excel file, each created and opened Excel file is an independent Workbook object
sheet The form in the Excel document, each Excel document requires at least one sheet
cell is an indivisible basic data storage unit
Small scale chopper
Let's first run a test
from openpyxl import Workbook
# 创建一个 workbook
wb = Workbook()
# 获取被激活的 worksheet
ws = wb.active
# 设置单元格内容
ws['A1'] = 42
# 设置一行内容
ws.append([1, 2, 3])
# python 数据类型可以被自动转换
import datetime
ws['A2'] = datetime.datetime.now()
# 保存 Excel 文件
wb.save("sample.xlsx")
have to be aware of is:
The newly created workbook object will come with a form called Sheet, and the new Office Excel will create 3
The created workbook will
表单
activate the first one , get the reference through wb.activeLike the
python-docx
work library, the save method will save immediately without any prompts, it is recommended to choose a different file name to save
Common Functions
OpenPyXl has many functions, from cell processing to chart display, covering almost all Excel functions. Here are some commonly used functions for display. For more usage, please refer to the OpenPyXl document (the link at the end of the article is a link)
Create and open Excel
The small trial section saw how to create an Excel
If you want to load an existing Excel file, you need to use the load_workbook
method, given the file path, return the workbook object:
from openpyxl import load_workbook
wb = load_workbook('test.xlsx')
# 显示文档中包含的 表单 名称
print(wb.sheetnames)
load_workbook
In addition to the parameters filename
, there are some useful parameters:
read_only
: Whether it is a read-only mode, for very large files, it is helpful to improve efficiencykeep_vba
: Whether to keep the vba code, that is, when opening the Excel file, open and retain the macroguess_types
: Whether to do the type judgment when reading the cell data typedata_only
: Whether to convert the formula to the result, that is, the cell containing the formula, whether to display the most recent calculation resultkeep_links
: Whether to keep external links
Operation sheet
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws1 = wb.create_sheet("sheet") #创建一个 sheet 名为 sheet
ws1.title = "新表单" # 设置 sheet 标题
ws2 = wb.create_sheet("mysheet", 0) # 创建一个 sheet,插入到最前面 默认插在后面
ws2.title = u"你好" # 设置 sheet 标题
ws1.sheet_properties.tabColor = "1072BA" # 设置 sheet 标签背景色
# 获取 sheet
ws3 = wb.get_sheet_by_name(u"你好")
ws4 = wb['New Title']
# 复制 sheet
ws1_copy = wb.copy_worksheet(ws1)
# 删除 sheet
wb.remove(ws1)
Each Workbook has an activated sheet, generally the first one, which can be obtained directly through active
You can get the sheet object by the sheet name
When creating a sheet, you need to provide the sheet name parameter. If a sheet with this name already exists, 1 will be added after the name, and then 2 will be added repeatedly, and so on.
After obtaining the sheet object, you can set properties such as title and background color
In the same Workbook object, you can copy the sheet, you need to use the source sheet object as a parameter, the copied new sheet will be at the end
You can delete a sheet, the parameter is the target sheet object
Operation cell
A cell (cell) is the smallest unit of data stored in Excel, which is a small grid in the graphical interface
OpenPyXl can operate on single cells or batches of cells
Stand alone
Operate alone, that is, get the cell by Excel cell name or row and column coordinates, and operate
ws1 = wb.create_sheet("Mysheet") #创建一个sheet
# 通过单元格名称设置
ws1["A1"]=123.11
ws1["B2"]="你好"
# 通过行列坐标设置
d = ws1.cell(row=4, column=2, value=10)
Can be set by cell name, similar to a certain property of sheet
It can also be set through the row and column coordinate class
Bulk operations
When you need to operate multiple cells at once, you can use batch operations to improve efficiency
Designated ranks
# 操作单列
for cell in ws["A"]:
print(cell.value)
# 操作单行
for cell in ws["1"]:
print(cell.value)
# 操作多列
for column in ws['A:C']:
for cell in column:
print(cell.value)
# 操作多行
for row in ws['1:3']:
for cell in row:
print(cell.value)
# 指定范围
for row in ws['A1:C3']:
for cell in row:
print(cell.value)
All rows or columns
# 所有行
for row in ws.iter_rows():
for cell in row:
print(cell.value)
# 所有列
for column in ws.iter_cols():
for cell in column:
print(cell.value)
Set the entire row of data
ws.append((1,2,3))
Merge Cells
# 合并
ws.merge_cells('A2:D2')
# 解除合并
ws.unmerge_cells('A2:D2')
ws.merge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
The merge_cells method of the sheet object is to merge cells, and the unmerge_cells is to unmerge
There are two types of parameters, one is specified by cell name, the other is specified by named parameter
Note: An error will be reported when calling unmerge_cells for a location that has not been merged
Cell format
OpenPyXl uses 6 types to style cells
NumberFormat
digitalAlignment
AlignFont
FontBorder
framePatternFill
fillingProtection
protection
from openpyxl.styles import Font, PatternFill, Border, Side, Alignment, Protection
from openpyxl.styles import numbers
wb = Workbook()
ws = wb.active
ws.cell(row=1, column=1, value='宋体').font = Font(name=u'宋体', size=12, bold=True, color='FF0000')
ws.cell(row=2, column=2, value='右对齐').alignment = Alignment(horizontal='right')
ws.cell(row=3, column=3, value='填充渐变色').fill = PatternFill(fill_type='solid', start_color='FF0000')
ws.cell(row=4, column=4, value='设置边线').border = Border(left=Side(border_style='thin', color='FF0000'), right= Side(border_style='thin', color='FF0000'))
ws.cell(row=5, column=5, value='受保护的').protection = Protection(locked=True, hidden=True)
ws.cell(row=6, column=6, value=0.54).number_format =numbers.FORMAT_PERCENTAGE
Introduce font classes
Use the cell method to set the format while setting the value for the cell
Each format has specific properties, set specific format objects for it
The number format is a bit different, it is done by setting the format name, numbers.FORMAT_PERCENTAGE is a string
Border class, which needs to be used with the Side class, they are all defined in openpyxl.styles
It should be noted that the cell style attribute can only be assigned through the style object, but cannot be modified through the style attribute. For example
ws.cell(1, 1).font.color = '00FF00'
, an error will be reported. If you really want to change it, you need to recreate a style entity and reassign it.
The above shows the setting of a single cell format, which can also be set in batches. There are two ways, one is to set all the cells in the loop range one by one, and the other is to set the entire column or row:
font = Font(bold=True)
# 遍历范围内的单元格
for row in ws['A1:C3']:
for cell in row:
cell.font = font
# 设置整行
row = ws.row_dimensions[1]
row.font = font
# 设置整列
column = ws.column_dimensions["A"]
column.font = font
For more style class definitions and parameters, see the OpenPyXl documentation
chart
Charts are a very important part of Excel. As an efficient tool for data visualization, OpenPyXl can be used to programmatically create charts in Excel. The creation process is almost the same as in Excel. The following uses the bar chart and pie chart as an example. Demo
Histogram
from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference
wb = Workbook()
ws = wb.active
rows = [
('月份', '苹果', '香蕉'),
(1, 43, 25),
(2, 10, 30),
(3, 40, 60),
(4, 50, 70),
(5, 20, 10),
(6, 10, 40),
(7, 50, 30),
]
for row in rows:
ws.append(row)
chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "销量柱状图"
chart1.y_axis.title = '销量'
chart1.x_axis.title = '月份'
data = Reference(ws, min_col=2, min_row=1, max_row=8, max_col=3)
series = Reference(ws, min_col=1, min_row=2, max_row=8)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(series)
ws.add_chart(chart1, "A10")
Introduce BarChart, a histogram class, and Reference, a data application class
Create a Workbook and add data for the active Sheet
Create a histogram object, set chart properties, type
col
is column chart, andbar
horizontal chartCreate a data reference object, specify which sheet and data range
Create Series Data Reference Object
Add data and series to chart objects
Finally, add the chart object to the sheet with add_chart
Pie chart
from openpyxl import Workbook
from openpyxl.chart import PieChart, Reference
data = [
['水果', '销量'],
['苹果', 50],
['樱桃', 30],
['橘子', 10],
['香蕉', 40],
]
wb = Workbook()
ws = wb.active
for row in data:
ws.append(row)
pie = PieChart()
pie.title = "水果销量占比"
labels = Reference(ws, min_col=1, min_row=2, max_row=5)
data = Reference(ws, min_col=2, min_row=1, max_row=5)
pie.add_data(data, titles_from_data=True)
pie.set_categories(labels)
ws.add_chart(pie, "D1")
Introduce PieChart and Data Application Reference
Create chart data
Create chart object, set chart title
Define label data references and data references and add them to the chart
Add the chart object to the specified position of the sheet
to sum up
Today, taking the OpenPyXl library as an example, I learned the basic method of Python operation of Excel. Due to space limitations, I cannot introduce more functions in a comprehensive and clear way. I hope that through this short article, you will be interested in multi-programming Excel operation, letting you work, Learning is more efficient, just like the famous saying: "Life is short, I use Python"
reference
OpenPyXl documentation https://openpyxl.readthedocs.io
Excel drawing https://zhuanlan.zhihu.com/p/34917620
https://www.jianshu.com/p/be1ed0c5218e
https://www.douban.com/note/706513912/
https://blog.csdn.net/weixin_41595432/article/details/79349995
Old rules , do the brothers still remember, click on the "watching" in the lower right corner ,if you feel the content of the article is good, remember to share the circle of friends to let more people know!
[ Code acquisition method ]
Identify the QR code at the end of the article, reply: 200413