Python library for processing tabular data, recommended books for Python processing tables

This article mainly introduces what modules to install in Python to process tables. It has certain reference value and friends in need can refer to it. I hope you will gain a lot after reading this article. Let the editor take you to understand it together.

1. Excel document

Workbook: .xlsx file, containing multiple tables (worksheets).

Active table: the table currently viewed by the user and the last table viewed before closing Excel. How to reduce the weight of thesis case .

2. Install the openpyxl module

import openpyxl

3. Read Excel table

First make a sample table 1.xlsx

 1. Use the openpyxl module to open the Excel document

The openpyxl module provides the openpyxl.load_workbook() function to open Excel documents.

 The openpyxl.load_workbook() function opens an Excel document and returns a value of the workbook data type.

The workbook object represents this Excel file, which is equivalent to the File object representing an open text file.

Remember, example.xlsx needs to be in the current working directory before you can process it. You can import os so that
Use the function os.getcwd() to find out what the current working directory is, and use os.chdir() to change the current working directory.

2. Get the worksheet from the workbook

Call the get_sheet_names() method to get a list of all sheet names in the workbook.
>>> import openpyxl
>>> wb = openpyxl.load_workbook('example.xlsx') 打开Excel文档,得到workbook对象
>>> wb.get_sheet_names() workbook对象调用get_sheet_names()方法,得到工作薄中的所有工作表名
['Sheet1', 'Sheet2', 'Sheet3']   所有工作表名
>>> sheet = wb.get_sheet_by_name('Sheet3') get_sheet_by_name()方法指定工作表名,得到工作表对象,即worksheet对象
>>> sheet
<Worksheet "Sheet3">  Worksheet对象
>>> type(sheet)
<class 'openpyxl.worksheet.worksheet.Worksheet'>
>>> sheet.title      worksheet对象调用title方法,得到工作表名
'Sheet3'
>>> anotherSheet = wb.get_active_sheet() worksheet对象调用get_active_sheet()方法得到活动表
>>> anotherSheet
<Worksheet "Sheet1">

openpyxl.load_workbook('example.xlsx'): Open the Excel file and get the workbook object, workbook object.

wb.get_sheet_by_name('Sheet3'): The workbook object calls the get_sheet_by_name() method to get the worksheet object, worksheet object.

The get_sheet_names() method can obtain the list of all sheet names in the workbook wb.sheetnames
  The get_sheet_by_name() method obtains the worksheet object (this method has been deprecated in higher versions of Python)
sheet=wb['Sheet5'] #Get the worksheet object
The get_active_sheet() method obtains the active sheet of the workbook.

3. Get cells from the table

Workbook object - worksheet object - cell object

import openpyxl

wb=openpyxl.load_workbook('1.xlsx')    #得到工作薄对象
sheet=wb['Sheet5']                     #得到工作表对象
print(type(sheet['A1']))               #得到单元格对象
print(sheet['A1'].value)
c=sheet['B1']
print(c)
print(c.value)


 Cell object, cell object, has value, row, column and coordinate attributes.

The worksheet object calls the cell() method, passing in integers as the row and column keyword parameters, and you can also get a cell object.

import openpyxl

wb=openpyxl.load_workbook('1.xlsx')    #得到工作薄对象
sheet=wb['Sheet5']                     #得到工作表对象
a=sheet.cell(row=1,column=2)
print(type(a))

 Using the cell() method and its keyword arguments, you can write a for loop to print out the values ​​of a series of cells.

Exercise: Print the values ​​of all cells in column B.

The get_highest_row() method and get_highest_column() method of the worksheet object can get the size of the table, but in the latest openpyxl module, they have been abolished and max_row and max_column are used instead.

import openpyxl

wb=openpyxl.load_workbook('1.xlsx')    #得到工作薄对象
sheet=wb['Sheet5']   #得到工作表对象
r=sheet.max_row
c=sheet.max_column
print(r,c)
for i in range(1,7):
    a=sheet.cell(row=i,column=2)
    print(a.value)

4. Conversion between column letters and numbers

To convert from letters to numbers, call the openpyxl.cell.column_index_from_string() function.
To convert from numbers to letters, call the openpyxl.cell.get_column_letter() function. in an interactive environment
Enter the following code into:
>>> import openpyxl
>>> from openpyxl.cell import get_column_letter, column_index_from_string
>>> get_column_letter(1)
'A'
>>> get_column_letter(2) 
'B'
>>> get_column_letter(27)
'AA'
>>> get_column_letter(900)
'AHP'
>>> wb = openpyxl.load_workbook('example.xlsx')
>>> sheet = wb.get_sheet_by_name('Sheet1')
>>> get_column_letter(sheet.get_highest_column())
'C'
>>> column_index_from_string('A')
1 
>>> column_index_from_string('AA')
27

5. Get rows and columns from the table

You can individually obtain a row, a column, or all Cell objects in a rectangular area in the worksheet. You can then loop through all cells in this slice.

import openpyxl

wb=openpyxl.load_workbook('1.xlsx')    #得到工作薄对象
sheet=wb['Sheet5']   #得到工作表对象
print(tuple(sheet['A1':'C3']))

 Use a loop to output the values ​​of the selected range:

 We specified that we need Cell objects in the rectangular area from A1 to C3 , and got a Generator object, which contains the Cell objects in this area . To help us see this Generator object clearly, we can use its tuple() method to list its Cell objects in a tuple .

The Generator object is a tuple, a large tuple, with many small tuples in it, and each row is a tuple.

Therefore, to print out the values ​​of all cells in this area, two for loops must be used. The outer for loop traverses each row in this slice. The second loop, for each row, loops through every cell in that row.

You can also access the value of a specified row or column. At this time, you can use the rots and columns properties of the Worksheet object.

import openpyxl

wb=openpyxl.load_workbook('1.xlsx')    #得到工作薄对象
sheet=wb['Sheet5']   #得到工作表对象
a=sheet['A1':'C3']
b=[col for col in sheet.columns][1]
print(b)

 Using the rows property of the Worksheet object , you can get a tuple of tuples. Each tuple inside represents a row and contains the Cell objects in that row . The columns property will also give you a tuple of tuples, each containing a Cell object in one column . For example.xlsx , since there are 7 rows and 3 columns, rows gives a tuple of 7 tuples (each inner tuple contains 3 Cell objects). columns gives a tuple of 3 tuples (each inner tuple contains 7 Cell objects ). To access a specific tuple, use its subscript in the larger tuple. For example, to get the tuple representing column B, you can use sheet.columns[1] . To get a tuple representing column A , use sheet.columns[0] . After you have a tuple representing a row or column, you can loop through its objects and print out their values.

6. Workbook, worksheet, cell

As a quick refresher, here are all the functions, methods, and
type of data.
1 . Import the openpyxl module.
2 Call the openpyxl.load_workbook() function.
3 Get the Workbook object.
4 Call the get_active_sheet() or get_sheet_by_name() workbook method.
5 . Get the Worksheet object.
6 . Use the cell() method of the index or worksheet with the row and column keyword parameters.
7 . Get the Cell object.
8 . Read the value attribute of the Cell object .

4. Project: Reading data from a spreadsheet

5. Write to Excel table

OpenPyXL also provides methods for writing data, which means your program can create and edit spreadsheet files. Using Python , it is very simple to create a spreadsheet containing thousands of rows of data.

1. Create and save Excel document

 I found that the book I am currently studying, Chapter 13 of Quick Start with Python Programming, mainly uses the openpyxl module to process Excel spreadsheets. However, the tutorials on this module in the book are outdated, and many functions have been abandoned or replaced, so I I decided to find a tutorial on the Internet again to learn the openpyxl module.

openpyxl

1. Introduction

Official documentation: https://openpyxl.readthedocs.io/en/stable/

Note: The openpyxl module only supports xlsx/xlsm/xltx/xltm format, and does not support xls format.

2. New

1. Create a new workbook

 from openpyxl import Workbook: Import the Workbook class from the openpyxl module

wb = Workbook() instantiates the object wb from the Workbook class

ws = wb.active The wb object calls the active method to get the active table

2. Create a new worksheet

You can use the create_sheet() function to create a new worksheet.

#新建工作表,名称以Sheet1,Sheet2,....自动填充
ws_1 = wb.create_sheet()    # 默认在结尾处新建一个新的工作表
ws_2 = wb.create_sheet(0)   # 在当前工作表的指定索引处新建一个工作表

# 用title指定工作表名称
ws_1.title = "新建工作表" 

# 新建工作表,并指定名称
ws_3 = wb.create_sheet(title="新建工作表-2",index=0)
ws_4 = wb.create_sheet("新建工作表-1", 0) 

# 改变工作表标签颜色,默认为无颜色
ws.sheet_properties.tabColor = "F22F27"   

 

 3. Operation

1. Set the workbook to read-only

wb = load_workbook(filename='数据.xlsx', read_only=True) #设置只读

 

 2. Worksheet operations

# 导入openpyxl模块的Workbook类
import openpyxl
from openpyxl import load_workbook

wb=load_workbook(filename='1.xlsx',read_only=True)
print(wb.sheetnames) # 获取工作薄中所有工作表名称
for sheet in wb:
	print(sheet.title) # 遍历工作薄中所有工作表名
#获取指定工作表名称
ws=wb['Sheet1'] #创建工作表对象
print('***********'+ws.title) #输入工作表名称
ws_copy=wb.copy_worksheet(ws) #复制工作表

#删除工作表
# 方式一
ws = wb["Sheet1"]
wb.remove(ws)
# 方式二
del wb["Sheet1"]

3. Row and column operations

Get the cell range

 Insert blank rows and columns

 Delete rows and columns

 4. Access cells

 Note: When a worksheet is created, it does not contain cells; it is created only when the cells are retrieved.

This way we don't create cells that we never use, thus reducing memory consumption.

Access a single cell

cell_A2=ws['A2']

cell_C3=ws[row=3,column=3]

Access multiple cells

Access via slices:

cell_area = ws['A1':'B4']
cell_exact = ws.iter_rows(min_row=1, max_row=3, min_col=1, max_col=2)     #即A1:B3

Access via rows and columns:

col_A = ws['A'] #Column A
col_area = ws['A:B'] #Column A and B row_2
= ws[2] #Row 2
row_area = ws[2:5] #Rows 2-5

# Iterate over all rows
all_by_row = ws.rows 

# Iterate over all columns
all_by_col = ws.columns  

The results can be processed using tuple(), list(), and loops

e.g. 1.xlsx

import openpyxl
from openpyxl import load_workbook

wb=load_workbook('1.xlsx')
ws=wb['Sheet5']
#获取单个单元格的值
cell_A2=ws['A2']
cell_C3=ws.cell(row=3,column=3)
print(cell_A2.value,cell_C3.value)
print('---------------------------')
#获取多个单元格的值
cell_area = ws['A1':'D4']
for row in cell_area:
    for cell in row:
        print(cell.value)

 5. Set row height and column width

Set the entire row height

# 设置第2行行高
ws.row_dimensions[2].height = 40

Set the entire column width

# 设置C列列宽
ws.column_dimensions['C'].width = 30

Finally, be sure to remember to save, wb.save('file name'), otherwise it will not be applied.

e.g. 1.xlsx

import openpyxl
from openpyxl import load_workbook

wb=load_workbook('1.xlsx')
ws=wb['Sheet5']

# 设置第2行行高
ws.row_dimensions[2].height = 40

# 设置C列列宽
ws.column_dimensions['C'].width = 30

wb.save('1.xlsx')

The original: 

After modification: 

 6. Merge cells

 merge

Note: If you want to write data in merged cells, you only need to write data in the cell in the upper left corner of the merged area.

If the cells in the merged area have data, only the data in the upper left corner will be retained.

ws.merge_cells('A2:D4')

ws.merge_cells(start_row=2, start_column=1, end_row=4, end_column=4)

import openpyxl
from openpyxl import load_workbook

wb=load_workbook('1.xlsx')
ws=wb['Sheet5']

ws.merge_cells('A5:B6')
ws['A5']='这是一个合并后的单元格'
wb.save('1.xlsx')

 

 Unmerge

ws.unmerge_cells('A2:D4')

ws.unmerge_cells(start_row=2, start_column=1, end_row=4, end_column=4)

import openpyxl
from openpyxl import load_workbook

wb=load_workbook('1.xlsx')
ws=wb['Sheet5']

ws.merge_cells('A5:B6')
ws['A5']='这是一个合并后的单元格'
ws.unmerge_cells('A5:B6')


wb.save('1.xlsx')

4. Write

1. Write data

# 在单元格写入数据
ws['A1'] = 42    #A1单元格写入
ws.cell(row=1, column=2, value=42)   #B1单元格写入
ws.cell(1,3).value= 42   #C1单元格写入

# 新增一行数据
ws.append([1, 2, 3, 4])

for example:

import openpyxl
from openpyxl import load_workbook

wb=load_workbook('1.xlsx')
ws=wb['Sheet5']

ws['A1']='姓名'
ws.cell(row=5, column=1, value='韩梅梅')   #A5单元格写入
ws.cell(5,2).value= 42   #B5单元格写入

ws.append(['离美', 2, 3, 4]) # 新增一行数据

wb.save('1.xlsx')

2. Write the formula 

#写入公式
ws['B2'] = "=SUM(A2:A4)"
ws.cell(row=2, column=2, value = "=SUM(A2:A4)")
ws.cell(2,2).value = "=SUM(A2:A4)"

3. Insert pictures

img = Image('image') # image:要插入的图片
ws.add_image(img, 'B1') #在B1单元格插入图片

5. Set cell style

Cell styles include: number_format (data format), Font (font), Fill (fill), Border (border), Alignment (alignment), Protection (protection).

1. Digital format

import openpyxl
from openpyxl import Workbook
import datetime

wb = Workbook()
ws = wb.active

ws['A1'] = '文字'
print(ws['A1'].number_format)    #-->>> General

ws['A2'] = 5
print(ws['A2'].number_format)   #-->>> General

ws['A3'] = 0.05
ws['A3'].number_format='0.00%' # 自定义格式
print(ws['A3'].number_format)  # -->>> 0.00%

ws['B1'] = datetime.datetime.now()
print(ws['B1'].number_format)   # -->>> yyyy-mm-dd h:mm:ss

ws['B2'] = datetime.datetime.now()
ws['B2'].number_format='yyyy-mm-dd' # 自定义格式
print(ws['B2'].number_format)  #-->>>  yyyy-mm-dd 

wb.save("2.xlsx") 

 2. Font

import

# 导入Font
from openpyxl.styles import Font

Parameter Description

name           #字体
size           #字号,默认11。
bold           #是否加粗,默认False。加粗:True
italic         #是否斜体,默认False。斜体:True
vertAlign      #上下标,默认None。正常:baseline,上标:super,下标:sub
color          #字体颜色,默认黑色(FF000000)。
strikethrough  #删除线,默认不设置。设置:True
underline      #下划线,默认不带下划线。单下划线:single,双下划线:double,会计用单下划线:singleAccounting,会计用双下划线:doubleAccounting

Example

import openpyxl
from openpyxl import Workbook
from openpyxl.styles import Font
import datetime

wb = Workbook()
ws = wb.active

ws['A1'] = '默认'      #A1单元格写入
ws['B2'] = '设置格式'  #B2单元格写入
ws['C3'] = '设置上标'  #C3单元格写入

#设置B2单元格字体格式
ws['B2'].font = Font(name='Calibri', size=12, color="00FF9900", italic=True, underline='double', strikethrough=True)

#设置C3单元格字体格式
ws['C3'].font = Font(vertAlign='super',  bold=True)

#保存

wb.save("2.xlsx") 

 3. Filling

solid color fill

import

#导入
from openpyxl.styles import PatternFill

Parameter Description

fill_type    #设置图案样式,如果不设置则不会显示颜色。
'''
fill_type可设置的值:solid(实心),lightHorizontal, 
darkTrellis, darkUp, darkGray, darkVertical, lightDown,
lightTrellis, lightUp, darkDown, darkHorizontal, mediumGray, 
lightVertical, gray0625, gray125, lightGrid, darkGrid, lightGray
'''
fgColor/start_color     #前景色 ,即填充色
bgColor/end_color      #背景色,即图案颜色

 Example

from openpyxl import Workbook
from openpyxl.styles import PatternFill

wb = Workbook()
ws = wb.active

ws['A1'] = '默认'    #A1单元格写入
ws['B2'] = '前景色'  #B2单元格写入
ws['C3'] = '背景色'  #C3单元格写入

#前景色,即填充色。也是我们一般设置的填充色
ws['B2'].fill = PatternFill(fill_type='solid', fgColor='00FF9900') 

#背景色,即图案颜色。
ws['C3'].fill = PatternFill(fill_type='solid', bgColor='00FF9900')  

wb.save("实例.xlsx") 

2. Gradient Fill (GradientFill)

import

#导入
from openpyxl.styles import GradientFill

Parameter Description

type/fill_type  # 渐变填充类型:linear,path

'''
linear :
渐变在一组指定的 Stops 之间插入颜色,跨越一个区域的长度。默认情况下渐变是从左到右的,但可以使用 degree 属性修改此方向。可以改为提供颜色列表,它们之间的距离将相等。

path: 
渐变从区域的每个边缘应用线性渐变。属性 top、right、bottom、left 指定从各个边界填充的范围。比如top=”0.2” 将填充单元格的前 20%。
'''

Example

from openpyxl.styles import GradientFill
from openpyxl import Workbook

wb = Workbook()
ws = wb.active

# 合并单元格
ws.merge_cells('B2:F4')

#对合并单元格左上角单元格设置渐变填充
top_left_cell = ws['B2']
top_left_cell.fill = GradientFill(type='linear', degree=0, stop=('FFFFFF', '99ccff', '000000'))   #渐变填充

wb.save("实例.xlsx") 

4. Border

When setting the border style, you also need to pass Side.

import

#导入
from openpyxl.styles import Border, Side

Parameter Description

# Border参数说明
left = Side(style , color)  #左边框设置
right = Side(style , color)  #右边框设置
top = Side(style , color)  #上边框设置
bottom = Side(style , color) #下边框设置

diagonalDown  #是否显示左上-右下对角线,显示:True
diagonalUp   #是否显示左下-右上对角线,显示:True
diagonal = Side(style , color)    #对角线边框设置,注意首先要设置显示对角线

#Side参数说明
style/border_style   #边框样式
'''
边框样式可设置的有:
thick, mediumDashDot, dashed, mediumDashDotDot, 
dashDot, slantDashDot, dotted, double, thin, 
hair, dashDotDot, mediumDashed, medium
'''
color  #边框颜色

Example

from openpyxl import Workbook
from openpyxl.styles import Border, Side

wb = Workbook()
ws = wb.active


ws['A1'] = '默认'   #A1单元格写入
ws['B2'] = '边框'   #B2单元格写入
ws['C3'] = '对角线'      #C3单元格写入

#边框线格式设置
line_format = Side(style='medium',color='00FF9900')

#B2单元格设置上下左右边框
ws['B2'].border = Border(left=line_format, right=line_format, top=line_format, bottom=line_format)

#C3单元格设置对角线
ws['C3'].border = Border(diagonalDown=True, diagonalUp=True, diagonal=line_format)

#保存
wb.save("实例.xlsx") 

Reference link: Python automated office: openpyxl tutorial (basic)

Guess you like

Origin blog.csdn.net/chatgpt002/article/details/132908398