Excel artifact-OpenPyXl

Text |  Sun Snow

Source: Python technology "ID: pythonall"

Whether it is daily office or programming, Excel is always indispensable for importing and exporting data, recording data, statistical analysis, drawing prototypes, and even a grandfather in Japan uses Excel to create drawings

Although Excel is powerful and convenient to operate, it is still not convenient in some scenarios, such as importing a large amount of data into Excel, reading the data from Excel into the system, or formatting the original data according to a certain structure and batch processing A lot of Excel documents, etc. Fortunately, there are many Python libraries that can help us control Excel with programs to complete tasks that are difficult to complete manually. Now let ’s understand

Excel library under Python

There are a large number of native and third-party Excel operation packages in Python, each with their own strengths, but for students who have just used Python to interact with Excel, it may be a bit overwhelming, so first briefly sort out some common Excel packages

  • OpenPyXL  is a Python library that reads and writes Excel 2010 xlsx / xlsm / xltx / xltm. It is simple and easy to use. It has a wide range of functions, including cell format / picture / table / formula / filter / comment / document protection and so on. Chart function is one of them Highlights

  • xlwings  is a Python library based on the BSD license agreement. You can easily use Python to operate Excel, and you can also call Python in Excel to achieve Excel programming close to VBA syntax. It supports Excel macros and can be used as a Web server to provide REST API interfaces

  • Pandas  data processing is the foundation of pandas, Excel is used as a container for pandas input / output data

  • Win32com  can be seen from the naming, this is an extension to deal with windows applications, Excel is only a small part of the functions that the library can achieve. The library also supports many operations of office. It should be noted that the library does not exist alone, and can be obtained by installing pypiwin32 or pywin32

  • Xlsxwriter  has rich features, supports pictures / tables / graphs / filters / formats / formulas, etc. The function is similar to openpyxl, the advantage is that compared to openpyxl also supports VBA file import, sparklines and other functions, the disadvantage is that you cannot open / modify existing files , Meaning to use xlsxwriter from scratch

  • DataNitro is  a paid add-in for Excel, embedded in Excel, which can completely replace VBA and use python scripts in Excel. Since it is called python in Excel, it can also work with other python libraries.

  • Xlutils is  based on xlrd / xlwt, an old-style python package, and is considered a pioneer in this field. Its functional characteristics are quite satisfactory. The relatively big disadvantage is that it only supports xls files.

To summarize:

  • Do not want to use the GUI and want to give Excel more functions, openpyxl and xlsxwriter, you can choose one of the two;

  • Need to carry out scientific calculations and process large amounts of data. It is recommended to use pandas + xlsxwriter or pandas + openpyxl, which is a good choice;

  • If you want to write Excel scripts, you will know Python but not VBA. Consider xlwings or DataNitro;

  • The function of win32com is still very powerful, but it requires some windows programming experience to get started. It is equivalent to the encapsulation of windows COM, and the documentation is not perfect.

OpenPyXL

OpenPyXl can realize almost all Excel functions, and the interface is clear, the documentation is rich, and the learning cost is relatively low. Today, we will take OpenPyXL as an example to understand how to operate Excel.

installation

Install with pip

pip install openpyxl

After the installation is successful, you can run the following test:

python -c "import openpyxl"

basic concept

  • Workbook is equivalent to an Excel file, each created and opened Excel file is an independent Workbook object

  • sheet The form in the Excel document, each Excel document requires at least one sheet

  • cell is an indivisible basic data storage unit

Small scale chopper

Let's first run a test

from openpyxl import Workbook
# 创建一个 workbook
wb = Workbook()
# 获取被激活的 worksheet
ws = wb.active
# 设置单元格内容
ws['A1'] = 42
# 设置一行内容
ws.append([1, 2, 3])
# python 数据类型可以被自动转换
import datetime
ws['A2'] = datetime.datetime.now()
# 保存 Excel 文件
wb.save("sample.xlsx")

have to be aware of is:

  • The newly created workbook object will come with a form called Sheet, and the new Office Excel will create 3

  • The created workbook will 表单 activate the first one  , get the reference through wb.active

  • Like the  python-docx work library, the save method will save immediately without any prompts, it is recommended to choose a different file name to save

Common Functions

OpenPyXl has many functions, from cell processing to chart display, covering almost all Excel functions. Here are some commonly used functions for display. For more usage, please refer to the OpenPyXl document (the link at the end of the article is a link)

Create and open Excel

The small trial section saw how to create an Excel

If you want to load an existing Excel file, you need to use the  load_workbook method, given the file path, return the workbook object:

from openpyxl import load_workbook


wb = load_workbook('test.xlsx')


# 显示文档中包含的 表单 名称
print(wb.sheetnames)

load_workbook In addition to the parameters  filename, there are some useful parameters:

  • read_only: Whether it is a read-only mode, for very large files, it is helpful to improve efficiency

  • keep_vba : Whether to keep the vba code, that is, when opening the Excel file, open and retain the macro

  • guess_types: Whether to do the type judgment when reading the cell data type

  • data_only: Whether to convert the formula to the result, that is, the cell containing the formula, whether to display the most recent calculation result

  • keep_links: Whether to keep external links

Operation sheet

from openpyxl import Workbook
wb = Workbook()
ws = wb.active


ws1 = wb.create_sheet("sheet")  #创建一个 sheet 名为 sheet
ws1.title = "新表单"  # 设置 sheet 标题
ws2 = wb.create_sheet("mysheet", 0) # 创建一个 sheet,插入到最前面 默认插在后面
ws2.title = u"你好"  # 设置 sheet 标题


ws1.sheet_properties.tabColor = "1072BA"  # 设置 sheet 标签背景色


# 获取 sheet
ws3 = wb.get_sheet_by_name(u"你好")
ws4 = wb['New Title']


# 复制 sheet
ws1_copy = wb.copy_worksheet(ws1)


# 删除 sheet
wb.remove(ws1)
  • Each Workbook has an activated sheet, generally the first one, which can be obtained directly through active

  • You can get the sheet object by the sheet name

  • When creating a sheet, you need to provide the sheet name parameter. If a sheet with this name already exists, 1 will be added after the name, and then 2 will be added repeatedly, and so on.

  • After obtaining the sheet object, you can set properties such as title and background color

  • In the same Workbook object, you can copy the sheet, you need to use the source sheet object as a parameter, the copied new sheet will be at the end

  • You can delete a sheet, the parameter is the target sheet object

Operation cell

A cell (cell) is the smallest unit of data stored in Excel, which is a small grid in the graphical interface

OpenPyXl can operate on single cells or batches of cells

Stand alone

Operate alone, that is, get the cell by Excel cell name or row and column coordinates, and operate

ws1 = wb.create_sheet("Mysheet")  #创建一个sheet
# 通过单元格名称设置
ws1["A1"]=123.11
ws1["B2"]="你好"


# 通过行列坐标设置
d = ws1.cell(row=4, column=2, value=10)
  • Can be set by cell name, similar to a certain property of sheet

  • It can also be set through the row and column coordinate class

Bulk operations

When you need to operate multiple cells at once, you can use batch operations to improve efficiency

  • Designated ranks

# 操作单列
for cell in ws["A"]:
    print(cell.value)
# 操作单行
for cell in ws["1"]:
    print(cell.value)
# 操作多列
for column in ws['A:C']:
    for cell in column:
        print(cell.value)
# 操作多行
for row in ws['1:3']:
    for cell in row:
        print(cell.value)
# 指定范围
for row in ws['A1:C3']:
    for cell in row:
        print(cell.value)
  • All rows or columns

# 所有行
for row in ws.iter_rows():
    for cell in row:
        print(cell.value)
# 所有列
for column in ws.iter_cols():
    for cell in column:
        print(cell.value)
  • Set the entire row of data

ws.append((1,2,3))

Merge Cells

# 合并
ws.merge_cells('A2:D2')
# 解除合并
ws.unmerge_cells('A2:D2')


ws.merge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
ws.unmerge_cells(start_row=2,start_column=1,end_row=2,end_column=4)
  • The merge_cells method of the sheet object is to merge cells, and the unmerge_cells is to unmerge

  • There are two types of parameters, one is specified by cell name, the other is specified by named parameter

  • Note: An error will be reported when calling unmerge_cells for a location that has not been merged

Cell format

OpenPyXl uses 6 types to style cells

  • NumberFormat digital

  • Alignment Align

  • Font Font

  • Border frame

  • PatternFill filling

  • Protection protection

from openpyxl.styles import Font, PatternFill, Border, Side, Alignment, Protection
from openpyxl.styles import numbers


wb = Workbook()
ws = wb.active
ws.cell(row=1, column=1, value='宋体').font = Font(name=u'宋体', size=12, bold=True, color='FF0000')
ws.cell(row=2, column=2, value='右对齐').alignment = Alignment(horizontal='right')
ws.cell(row=3, column=3, value='填充渐变色').fill = PatternFill(fill_type='solid', start_color='FF0000')
ws.cell(row=4, column=4, value='设置边线').border = Border(left=Side(border_style='thin', color='FF0000'), right= Side(border_style='thin', color='FF0000'))
ws.cell(row=5, column=5, value='受保护的').protection = Protection(locked=True, hidden=True)
ws.cell(row=6, column=6, value=0.54).number_format =numbers.FORMAT_PERCENTAGE
  • Introduce font classes

  • Use the cell method to set the format while setting the value for the cell

  • Each format has specific properties, set specific format objects for it

  • The number format is a bit different, it is done by setting the format name, numbers.FORMAT_PERCENTAGE is a string

  • Border class, which needs to be used with the Side class, they are all defined in openpyxl.styles

  • It should be noted that the cell style attribute can only be assigned through the style object, but cannot be modified through the style attribute. For example  ws.cell(1, 1).font.color = '00FF00' , an error will be reported. If you really want to change it, you need to recreate a style entity and reassign it.

The above shows the setting of a single cell format, which can also be set in batches. There are two ways, one is to set all the cells in the loop range one by one, and the other is to set the entire column or row:

font = Font(bold=True)


# 遍历范围内的单元格
for row in ws['A1:C3']:
    for cell in row:
        cell.font = font


# 设置整行
row = ws.row_dimensions[1]
row.font = font


# 设置整列
column = ws.column_dimensions["A"]
column.font = font

For more style class definitions and parameters, see the OpenPyXl documentation

chart

Charts are a very important part of Excel. As an efficient tool for data visualization, OpenPyXl can be used to programmatically create charts in Excel. The creation process is almost the same as in Excel. The following uses the bar chart and pie chart as an example. Demo

Histogram

from openpyxl import Workbook
from openpyxl.chart import BarChart, Reference


wb = Workbook()
ws = wb.active


rows = [
    ('月份', '苹果', '香蕉'),
    (1, 43, 25),
    (2, 10, 30),
    (3, 40, 60),
    (4, 50, 70),
    (5, 20, 10),
    (6, 10, 40),
    (7, 50, 30),
]


for row in rows:
    ws.append(row)


chart1 = BarChart()
chart1.type = "col"
chart1.style = 10
chart1.title = "销量柱状图"
chart1.y_axis.title = '销量'
chart1.x_axis.title = '月份'


data = Reference(ws, min_col=2, min_row=1, max_row=8, max_col=3)
series = Reference(ws, min_col=1, min_row=2, max_row=8)
chart1.add_data(data, titles_from_data=True)
chart1.set_categories(series)
ws.add_chart(chart1, "A10")
  • Introduce BarChart, a histogram class, and Reference, a data application class

  • Create a Workbook and add data for the active Sheet

  • Create a histogram object, set chart properties, type  col is column chart, and bar horizontal chart

  • Create a data reference object, specify which sheet and data range

  • Create Series Data Reference Object

  • Add data and series to chart objects

  • Finally, add the chart object to the sheet with add_chart

Histogram

Pie chart

from openpyxl import Workbook
from openpyxl.chart import PieChart, Reference


data = [
    ['水果', '销量'],
    ['苹果', 50],
    ['樱桃', 30],
    ['橘子', 10],
    ['香蕉', 40],
]


wb = Workbook()
ws = wb.active


for row in data:
    ws.append(row)


pie = PieChart()
pie.title = "水果销量占比"
labels = Reference(ws, min_col=1, min_row=2, max_row=5)
data = Reference(ws, min_col=2, min_row=1, max_row=5)
pie.add_data(data, titles_from_data=True)
pie.set_categories(labels)


ws.add_chart(pie, "D1")
  • Introduce PieChart and Data Application Reference

  • Create chart data

  • Create chart object, set chart title

  • Define label data references and data references and add them to the chart

  • Add the chart object to the specified position of the sheet

Pie chart

to sum up

Today, taking the OpenPyXl library as an example, I learned the basic method of Python operation of Excel. Due to space limitations, I cannot introduce more functions in a comprehensive and clear way. I hope that through this short article, you will be interested in multi-programming Excel operation, letting you work, Learning is more efficient, just like the famous saying: "Life is short, I use Python"

reference

  • OpenPyXl documentation https://openpyxl.readthedocs.io

  • Excel drawing https://zhuanlan.zhihu.com/p/34917620

  • https://www.jianshu.com/p/be1ed0c5218e

  • https://www.douban.com/note/706513912/

  • https://blog.csdn.net/weixin_41595432/article/details/79349995

Old rules , do the brothers still remember, click on the "watching" in the lower right corner ,if you feel the content of the article is good, remember to share the circle of friends to let more people know!

[ Code acquisition method ]

Identify the QR code at the end of the article, reply: 200413

Published 292 original articles · 6900 praises · 2.07 million views

Guess you like

Origin blog.csdn.net/ityouknow/article/details/105548886