python file processing - Excel automatic processing (using openpyxl)

Introduction to openpyxl

openpyxl is a Python third-party library jointly developed by Eric Gazoni and Charlie Clark for processing Excel spreadsheets. Because it is a third-party library, you need to use the correct command to install it locally according to the system environment. The command is as follows:

Windows system: pip install openpyxl
MacOS system: pip3 install openpyxl

The openpyxl library can handle spreadsheet formats after Excel 2010, including: xlsx/xlsm/xltx/xltm

Import of openpyxl

To use the openpyxl library, you need to import it first. Let’s take a look at the following code.

import openpyxl

# 通过文件路径,打开工作簿
wb1 = openpyxl.load_workbook('./demo_excel.xlsx')
# 用 Workbook() 创建新工作簿
wb2 = openpyxl.Workbook()

In the above code, we first directly import the openpyxl library using import, then use openpyxl.load_workbook() to open an existing workbook, and use openpyxl.Workbook() to create a new workbook.

But we have to bring openpyxl. every time we use it, which is a bit troublesome.

For convenience, today I will teach you a new import method: from...import....

from...import... is a variant of the import statement, which can import functions, methods, classes or variables in libraries or modules.

The syntax is: from library/module import function/method/class/variable.

You can use from...import... to import multiple names in one line, and separate different names with commas.

For example: from library/module import function 1, class 1.

So the above code can be rewritten as:

from openpyxl import load_workbook, Workbook

# 通过文件路径,打开已有工作簿
wb1 = load_workbook('./demo_excel.xlsx')
# 用 Workbook() 创建新工作簿
wb2 = Workbook()

Pay attention to the first line of code, written as from openpyxl import load_workbook, Workbook. Then you can directly use the imported functions/methods/classes/variables.

Excel table operations

Typically, the sequence of manual operations can be summarized as:

1. Open the workbook
2. Confirm the worksheet
3. Manipulate cells

I will explain the relevant knowledge of openpyxl in the order of manual operations. Please refer to the figure below.
Excel table operations
According to the definition of openpyxl, an Excel file in .xlsx format represents a workbook object. So let's first learn how to obtain the workbook object, and then learn some basic operations of the workbook object.

How to get the workbook object

load_workbook(filename)

The first way to obtain a workbook object is certainly familiar to you. You need to use a function load_workbook(filename) in the openpyxl library. The parameter filename represents the path of the workbook, that is, the path of the .xlsx file. This function returns a workbook object.

Workbook()

We can also obtain a workbook object by instantiating the Workbook class. More precisely, this is a way to create a workbook object.

The syntax is very simple, just write it as Workbook(), and there is no need to write any parameters in the brackets.

The above are the two ways to obtain the workbook object. The following is an introduction to the basic operations related to the workbook object.

Workbook object operations

save(filename)

Just now, we successfully created a new workbook object by instantiating the Workbook class. If we want to save the newly created workbook object locally, we need to use the save() method of the workbook object.
The syntax is: workbook object.save(filename). The parameter filename represents the file path of the new workbook. Here I recommend ending with .xlsx as the path of the new workbook.

The workbook object obtained through load_workbook(filename) can also use the method save(filename).

If the parameter filename remains unchanged, it will be saved in the original path, which is equivalent to modifying the original file; if the parameter filename changes, it will be saved in a new path, which is equivalent to saving as a new file.
To sum it up with pictures:
save() method

How to get the worksheet object

The worksheet is the tab located at the bottom of the workbook. In actual operation, we can select different worksheets by clicking on different labels.
worksheet
The actual worksheet corresponds to the worksheet object (Worksheet object) in openpyxl.
A worksheet object (Worksheet object) represents a worksheet in the workbook.

active attribute

active will get the active worksheet. What is the active worksheet?

Normally, the active worksheet refers to the currently selected worksheet. After opening an .xlsx file, the worksheet displayed by default is the active worksheet.

Get table by table name

If we know the name of the worksheet, we can index the table name and use the workbook object ['table name'] to get the specified worksheet object. Generally speaking, if there are multiple worksheets in the workbook and we know the names of these worksheets, we can use the method of fetching the tables by table name.

Compare the following two ways of obtaining worksheet objects. You can decide which writing method to use based on actual needs.
Get worksheet

Basic operations on worksheet objects

Get a single row or column

In Excel tables, numbers are used to represent row numbers and English letters are used to represent column names.
In openpyxl, we can obtain a tuple through the worksheet object [row number] or worksheet object ['column name']. This tuple contains all the data in the specified row or column.

Get multiple rows of data (iter_rows())

We can use the iter_rows() method of the worksheet object to obtain multiple rows of data within the specified range in the table.

The syntax of iter_rows() is shown in the figure below.
iter_rows()
The parameters min_row and max_row represent the minimum row index and the maximum row index respectively. The value of the minimum row index defaults to 1, and the value of the maximum row index defaults to the number of rows in the bottom row with data in the table;

The parameters min_col and max_col represent the minimum column index and the maximum column index respectively. The value of the minimum column index defaults to 1, and the value of the maximum column index defaults to the number of columns in the rightmost column with data in the table;

The parameter values_only determines whether to return the value of the cell. If it is True, the value of the cell is returned. If it is False, the cell object is returned. Normally, when you only read data, you need to set this parameter to True. When you want to write data, just keep it at the default False.

The method iter_rows() of the worksheet object returns an iterable object containing n tuples, where n is the number of rows specified in the parameter, and each tuple represents a row in the table.

Therefore, usually, iter_rows() is used in conjunction with a for loop, allowing us to take out each tuple in the iterable object it returns, that is, each row of data within the specified range in the table.

The iter_rows() method specifies that if there is no data in the specified row, an empty tuple will be returned.

When the parameter values_only keeps its default value of False, the iter_rows() method will return the cell object within the specified range. In addition to the value, the cell object also has a series of attributes such as format.

adding data

We can use append() of the worksheet object to add a row of data. I believe this method is familiar to you.

This method can add some iterable objects (common ones such as lists and tuples) to the worksheet object, that is, append a row of data to the end of the table.

The syntax is also very simple, just write it as worksheet object.append (list/tuple).

However, it should be noted that after using append() to add data, if you want to see the added data in the local Excel file, you must save the workbook, that is, use the save() method of the workbook object .

For specific writing methods, you can refer to the code below.

from openpyxl import load_workbook

# 打开【公司人员名单.xlsx】工作簿
staff_wb = load_workbook('./公司人员名单.xlsx')
# 获取活动工作表
active_ws = staff_wb.active

info_list = ['S1911', '萧爵瑟', 3000, '内容']
info_tuple = ('S1912', '吴琐薇', 5000, '销售')

active_ws.append(info_list)
active_ws.append(info_tuple)

# 保存工作簿为【append_demo.xlsx】
staff_wb.save('./append_demo.xlsx')

In the above code, two rows of data are added. The two rows of data are lists and tuples, and they are indeed added to the end of the worksheet. If you run the above code locally, the corresponding xlsx file [append_demo.xlsx] will be generated.
Example

Summary of knowledge points

Insert image description here

cell object

The cell object represents a cell in the worksheet.
As shown in the figure below:
Insert image description here
Basically all operations on rows and columns can ultimately be returned to operations on cells.

How to get the cell object

The first one: use iter_rows() to get the rows in the specified range, and then use the index to get the cell object from the row.

Get the rows in the specified range through for row in worksheet object.iter_rows(). When the parameter values_only is the default False, the rows we get are tuples composed of cell objects, which can be traversed through index or for loop. way to get individual cell objects.
sample graph:
Insert image description here

Second type: Specify a specific row or column by row number or column name, and then traverse through a for loop to obtain each cell object in the specified row or column.

for cell in worksheet object [number of rows]
for cell in worksheet object ['column name']

from openpyxl import load_workbook

# 打开【公司人员名单.xlsx】工作簿
staff_wb = load_workbook('./codes/material/公司人员名单.xlsx')
# 获取活动工作表
staff_ws = staff_wb.active

# for循环遍历,取出第三行的所有单元格对象
for row_cell in staff_ws[3]:
    print(row_cell)

# for循环遍历,取出第三列(C列)的所有单元格对象
for col_cell in staff_ws['C']:
    print(col_cell)

The code outputs all cell objects in the specified row and column.
Insert image description here

The third method: specify specific cells through cell coordinates

Get the specific cell object directly through the worksheet object ['cell coordinates']. For example, the worksheet object ['A1'] will get the cell object corresponding to the cell A1 in the table.
Sample code:

from openpyxl import load_workbook

# 打开【公司人员名单.xlsx】工作簿
staff_wb = load_workbook('./codes/material/公司人员名单.xlsx')
# 获取活动工作表
staff_ws = staff_wb.active

# 打印单元格对象A1
print(staff_ws['A1'])

Output result:

bash:root$ python /home/python-class/root/main14.py
<Cell '下半年公司名单'.A1>

Basic operations on cell objects

value attribute

With the help of the value attribute of the cell object, we can get specific data; at the same time, we can also assign values ​​to the cell object through this attribute (modify the value of the cell or add a value to the cell).
The specific writing method is as follows:

# 获取单元格的值
单元格对象.value
# 给单元格对象赋值
单元格对象.value =

Example:

from openpyxl import load_workbook

# 打开【公司人员名单.xlsx】工作簿
staff_wb = load_workbook('./codes/material/公司人员名单.xlsx')
# 获取活动工作表
staff_ws = staff_wb.active

# 打印单元格对象C2的值
print(staff_ws['C2'].value)

# 修改单元格对象C2的值为10000
staff_ws['C2'].value = 10000

# 打印修改后的单元格对象C2的值
print(staff_ws['C2'].value)

# 将结果保存为【公司人员名单_new.xlsx】
staff_wb.save('./codes/material/公司人员名单_new.xlsx')

Through the 8th line of code, we print out the original value of cell C2, 8000.
In lines 11 to 15, we assign 10000 to cell C2 through the cell object .value, change its original value, and print the modified result.
Results as shown below:
Insert image description here

summary

Insert image description here

Summary of knowledge points

Let’s summarize the mind map:
Insert image description here
summary of important and difficult points:
Insert image description here

Guess you like

Origin blog.csdn.net/qq_41308872/article/details/131458711#comments_28559234