Python uses openpyxl to read and write excel files

foreword

According to the official documentation, openpyxl is a third-party library that can process Excel files in xlsx/xlsm format (A Python library to read/write Excel 2010 xlsx/xlsm files).
There are three main concepts in openpyxl: Workbook (worksheet), Sheet (table page) and Cell (grid).
The main operations in openpyxl: open Workbook, locate Sheet, and operate Cell.
The main methods of reading and writing in openpyxl are described below.

text

Installation method

pip install: sudo pip install openpyxl

Source installation: python setup.py install (download link at the bottom)

Workbook Actions

Read Workbook

from openpyxl import load_workbook
# 加载存在的 excel 文件: 默认可读写
wb = load_workbook("sample.xlsx")
# 只读模式打开文件
wb = load_workbook("sample.xlsx", read_only=True)

Write Workbook

from openpyxl import Workbook
# 新建一个新的工作表（未保存）。
wb = Workbook()
# 只写模式
wb = Workbook(write_only=True)
# 保存文件，若加载路径与保存的路径一致将会被覆盖
wb.save(r"F:\sample.xlsx")
# 将文件作为模板保存 as_template 默认为 False
wb.save("template.xltx", as_template=True)

Sheet operations

read sheet

# 获得所有 sheet 的名称()
name_list = wb.get_sheet_names()
# 根据 sheet 名字获得 sheet
for name in name_list:
    my_sheet = wb.get_sheet_by_name(name)
# 获得 sheet 名
    print(my_sheet.title)
# 获得当前正在显示的 sheet, 或 wb.get_active_sheet()
my_sheet = wb.active
# 通过索引加载 sheet，index 从0开始
my_sheet = wb.worksheets[index]
# 最大行
my_sheet.max_row
# 最大列
my_sheet.max_column
# 设置标签栏的字体颜色(标签栏背景色默认为白色）
my_sheet.sheet_properties.tabColor = "FF0000"

write sheet

# 获得所有 sheet 的名称
wb.get_sheet_names()
# 改工作表的名称
my_sheet.title = "Sheet1"
# 新建一个工作表，0是第一个位置
wb.create_sheet("Data", index=1)
#默认插在工作簿末尾
my_sheet = wb.create_sheet() 
# 删除某个工作表
wb.remove(my_sheet)
# 删除某个工作表
del wb[my_sheet]

Cell operation

Read Cell

# 获取某个单元格的值，观察 excel 发现也是先字母再数字的顺序，即先列再行
c3 = my_sheet["C3"]
# 列，即 C
c3.column
# 行，即 3
c3.row
# 坐标，即 C3
c3.coordinate
# 对应的值
c3.value
# 除了用下标的方式获得，还可以用cell 函数, 换成数字，这个表示 C3
c3_cell = my_sheet.cell(row=3, column=3)
print(c3_cell.value)
# 获得最大列和最大行
print(my_sheet.max_row)
print(my_sheet.max_column)
# 按行读取: 按 A1、B1、C1 顺序返回
for row in my_sheet.rows:
    for cell in row:
        print(cell.value)
# 按列读取: 按 A1、A2、A3 顺序返回
for column in my_sheet.columns:
    for cell in column:
        print(cell.value)

# 获取某一行的数据，例:获取第三行 tuple 对象
for cell in list(my_sheet.rows)[2]:
    print(cell.value)

# 获取矩形区间数据
for i in range(1, 4):
    for j in range(1, 3):
        print(my_sheet.cell(row=i, column=j))

# iter_rows() 方法获得多个单元格
for row in ws.iter_rows("A1:C2"):
    for cell in row:
        print cell

# 像切片一样使用        
for row_cell in my_sheet["A1":"B3"]:
    for cell in row_cell:
        print(cell)

write Cell

# 直接给单元格赋值就行
my_sheet["A1"] = "test"
# B9 处写入平均值
my_sheet["B9"] = "=AVERAGE(B2:B8)"
# 添加一行
row = [1 ,2, 3, 4, 5]
my_sheet.append(row)
# 添加多行
rows = [
    ["ID", "data1", "data2"],
    [2, 40, 20],
    [3, 40, 25],
    [4, 40, 30],
    [5, 40, 35],
    [6, 45, 40],
    [7, 40, 45],
]
my_sheet.append(rows)
# 添加多列
columns = list(zip(*rows))
my_sheet.append(columns)

Get the column number according to the letter, return the letter according to the column number

from openpyxl.utils import get_column_letter, column_index_from_string
# 根据列的数字返回字母
print(get_column_letter(3))  # C
# 根据字母返回列的数字
print(column_index_from_string("C"))  # 3

Set the cell style Style

from openpyxl.styles import Font, colors, Alignment
# 设置字体: 等线 24 号加粗斜体，字体颜色红色
bold_itatic_24_font = Font(name="等线", size=24, italic=True, color=colors.RED, bold=True)
my_sheet["A1"].font = bold_itatic_24_font
# 设置填充色: 
my_sheet["A2"].fill = PatternFill(fill_type=fills.FILL_SOLID, fgColor="00FF0000", bgColor="00FF0000")
# 对齐方式: B1 中的数据垂直居中和水平居中
my_sheet["B1"].alignment = Alignment(horizontal="center", vertical="center")
# 设置行高和列宽
my_sheet.row_dimensions[2].height = 40
my_sheet.column_dimensions["C"].width = 30
# 合并和拆分单元格
# 合并单元格， 往左上角写入数据即可
# 合并后只可以往左上角写入数据，也就是区间中:左边的坐标。
my_sheet.merge_cells("B1:G1") # 合并一行中的几个单元格
my_sheet.merge_cells("A1:C3") # 合并一个矩形区域中的单元格
my_sheet.unmerge_cells("A1:C3") #拆分后，值回到A1位置。

other instructions:

In order to be consistent with the expression in Excel, the row and column in openpyxl do not use 0 to represent the first value, but 1 to start with the habit of programming languages.
wb.worksheets[index] index starts at 0
Suppose sheet[“B9”] = “=AVERAGE(B2:B8)”, when reading data, data_only=True, when reading the formula, the formula returned by B9 is obtained. If this parameter is not added, the formula itself will be returned. "=AVERAGE(B2:B8)"
If the text encoding is "gb2312", it will display garbled characters after reading, please convert to Unicode first
When a worksheet is created, it contains no cells. Created only when the cell is fetched. This way we don't create cells that we never use, reducing memory consumption.
When saving, the suffix should be consistent
To save the file in xlsm format, you need to pass the parameter keep_vba=True
wb = load_workbook(“sample.xltm”, keep_vba=True), save as template document, need to pass as_template=True, save as document, need to pass as_template=False

Link:

openpyxl official documentation: http://openpyxl.readthedocs.io/en/default/
Common examples: http://openpyxl.readthedocs.io/en/default/usage.html
BitBucket address: https://bitbucket.org/openpyxl /openpyxl
openpyxl source download: https://pypi.python.org/pypi/openpyxl
A good tutorial https://automatetheboringstuff.com/chapter12/

If there are any mistakes, please point them out.

email: dxmdxm1992#gmail.com

blog: http://blog.csdn.net/david_dai_1108