[Guide to operating Excel with Python: Complete manual for reading and writing]

Article directory

- overview
- Write content
- read
- summary

overview

Reading and writing Excel files are common and important tasks in data processing and analysis. Python provides a variety of libraries and methods to make these operations easy and efficient. Below is a brief guide to common ways to use Python to read and write Excel in data processing and analysis.

Write content

Three common methods:
Method 1: Use the openpyxl library
Install the openpyxl library
First, Make sure you have the openpyxl library installed. If it is not installed, you can install it with the following command:

pip install openpyxl

Create and save Excel files

import openpyxl

# 创建一个新的Excel工作簿
workbook = openpyxl.Workbook()

# 获取默认的工作表
sheet = workbook.active

# 写入数据
sheet['A1'] = '姓名'
sheet['B1'] = '年龄'

# 添加一行数据
sheet.append(['Alice', 25])

# 保存工作簿
workbook.save('example_openpyxl.xlsx')

Insert image description here
Data reading, writing and formatting

import pandas as pd

# 读取Excel文件
df = pd.read_excel('example.xlsx')

# 打印数据框内容
print(df)

The above code uses the read_excel function of the pandas library to conveniently read the contents of the Excel file and store the data in a DataFrame.

Traversing and manipulating cells are common tasks in Excel file processing, and they can be easily accomplished using the openpyxl library. Here are some code examples for traversing and manipulating cells:
Traversing the values of column A

# 假设 sheet 是你的工作表对象
for cell in sheet['A']:
    print(cell.value)

This code loops through all the cells in column A and prints out their values.

Format cells

# 设置单元格 A1 的字体大小为14，加粗，并添加黄色填充
sheet['A1'].font = openpyxl.styles.Font(size=14, bold=True)
sheet['A1'].fill = openpyxl.styles.PatternFill(start_color="FFFF00", end_color="FFFF00", fill_type="solid")

This code sets the font size and weight of cell A1 and adds a yellow fill.
Create chart

from openpyxl.chart import BarChart, Reference

# 创建柱状图
chart = BarChart()

# 设置图表数据范围
data = Reference(sheet, min_col=2, min_row=1, max_col=3, max_row=5)
categories = Reference(sheet, min_col=1, min_row=2, max_row=5)

# 添加数据到图表
chart.add_data(data, titles_from_data=True)
chart.set_categories(categories)

# 将图表添加到工作表，位置在 E5 单元格
sheet.add_chart(chart, "E5")

This code uses openpyxl's chart module to create a histogram, sets its data range to a part of the worksheet, and finally adds the chart to the worksheet.
Merge and split cells

# 合并 B2 到 C2 单元格
sheet.merge_cells('B2:C2')

# 设置合并后单元格的值
sheet['B2'] = 'Merged Cells'

# 拆分单元格
sheet.unmerge_cells('B2:C2')

This code merges cells B2 through C2, sets the value of the merged cells, and then splits the cells.

Process data in batches

# 假设成绩数据在第二列，从第二行到最后一行
grades = [sheet.cell(row=i, column=2).value for i in range(2, sheet.max_row + 1)]

# 计算平均分
average_grade = sum(grades) / len(grades)

# 输出结果
print(f"平均分: {average_grade}")

Method 2: Use the pandas library

import pandas as pd

# 创建一个DataFrame
data = {
    
    '姓名': ['Bob', 'Alice'], '年龄': [30, 25]}
df = pd.DataFrame(data)

# 将DataFrame写入Excel文件
df.to_excel('example_pandas.xlsx', index=False)

Insert image description here

Method 3: Use xlwt

import xlwt

# 创建一个工作簿
workbook = xlwt.Workbook()

# 添加一个工作表
sheet = workbook.add_sheet('Sheet1')

# 写入数据
sheet.write(0, 0, '姓名')
sheet.write(0, 1, '年龄')
sheet.write(1, 0, 'Bob')
sheet.write(1, 1, 30)

# 保存工作簿
workbook.save('example_xlwt.xls')

Insert image description here
or:
Write list data

import xlwt

# 创建一个工作簿
workbook = xlwt.Workbook()

# 添加一个工作表
sheet = workbook.add_sheet('List_Data')

# 要写入的列表数据
data_list = ['Alice', 25, 'Bob', 30, 'Charlie', 35]

# 写入列表数据到Excel表格
for index, value in enumerate(data_list):
    sheet.write(index, 0, value)

# 保存工作簿
workbook.save('list_data_example.xls')

This code creates an Excel table and writes the elements in the list row by row into the first column of the Excel table.

Write dictionary data

import xlwt

# 创建一个工作簿
workbook = xlwt.Workbook()

# 添加一个工作表
sheet = workbook.add_sheet('Dict_Data')

# 要写入的字典数据
data_dict = {
    
    'Name': ['Alice', 'Bob', 'Charlie'],
             'Age': [25, 30, 35]}

# 写入字典数据到Excel表格
for col_num, key in enumerate(data_dict.keys()):
    sheet.write(0, col_num, key)
    for row_num, value in enumerate(data_dict[key]):
        sheet.write(row_num + 1, col_num, value)

# 保存工作簿
workbook.save('dict_data_example.xls')

This code creates an Excel table, uses the keys in the dictionary as the header, and writes the values row by row into the Excel table.

Write warehouse data

import xlwt
import numpy as np

# 创建一个工作簿
workbook = xlwt.Workbook()

# 添加一个工作表
sheet = workbook.add_sheet('Array_Data')

# 要写入的仓库数据
data_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# 写入仓库数据到Excel表格
for row_num, row_data in enumerate(data_array):
    for col_num, value in enumerate(row_data):
        # 将numpy.int32类型转换为int类型
        sheet.write(row_num, col_num, int(value))

# 保存工作簿
workbook.save('array_data_example.xls')

This code creates an Excel table and writes the elements in the warehouse into the Excel table row by row.

The above are three common methods of writing content to Excel. Choose one of them according to personal habits and project needs.

read

Method 2: Use the openpyxl library

import openpyxl

# 打开Excel文件
workbook = openpyxl.load_workbook('example.xlsx')

# 获取默认的工作表
sheet = workbook.active

# 读取数据
for row in sheet.iter_rows(values_only=True):
    print(row)

This code uses the openpyxl library to open the Excel file, traverse each row of the worksheet, and print out the data.
Method 3: Use xlrd library

import xlrd

# 打开Excel文件
workbook = xlrd.open_workbook('example.xls')

# 获取默认的工作表
sheet = workbook.sheet_by_index(0)

# 读取数据
for row_num in range(sheet.nrows):
    row = sheet.row_values(row_num)
    print(row)

The above code uses the xlrd library to open the Excel file, traverse each row of the worksheet, and print out the data.
Method 4: Use pyexcel library

import pyexcel

# 读取Excel文件
data = pyexcel.get_array(file_name='example.xlsx')

# 打印数据
for row in data:
    print(row)

This code uses the get_array function of the pyexcel library to read the Excel file and print the data. pyexcel provides a simple and easy-to-use interface, suitable for scenarios where data can be read quickly.

Example:
Create and write sample data to an Excel file

import pandas as pd

# 创建示例数据
data = {
    
    '姓名': ['Alice', 'Bob'], '年龄': [25, 30]}
df = pd.DataFrame(data)

# 将示例数据写入Excel文件
df.to_excel('example.xlsx', index=False)

Read an existing Excel file and add more data

import pandas as pd

# 读取已有的Excel文件
existing_data = pd.read_excel('example.xlsx')

# 添加更多的数据
new_data = {
    
    '姓名': ['Charlie', 'David'], '年龄': [28, 35]}
new_df = pd.DataFrame(new_data)

# 合并数据
combined_data = pd.concat([existing_data, new_df], ignore_index=True)

# 将合并后的数据写入Excel文件，不保存索引列
combined_data.to_excel('example_updated.xlsx', index=False)

Read the merged data and plot it

import matplotlib.pyplot as plt

# 读取合并后的数据
combined_data = pd.read_excel('example_updated.xlsx')

# 提取姓名和年龄列
names = combined_data['姓名']
ages = combined_data['年龄']

# 绘制曲线图
plt.plot(names, ages, marker='o', linestyle='-')
plt.title('年龄曲线图')
plt.xlabel('姓名')
plt.ylabel('年龄')
plt.grid(True)
plt.show()

Insert image description here

summary

In data processing and analysis, reading and writing Excel files are common and important tasks. Python provides a variety of libraries and methods for these operations, making processing data easy and efficient. This guide introduces methods of using pandas, openpyxl, xlrd, pyexcel and other libraries, covering common scenarios of reading and writing Excel.

写入内容：
    使用openpyxl库创建新的Excel工作簿，添加数据，并保存。
    使用pandas库创建DataFrame，然后将其写入Excel文件。
    使用xlwt和xlrd库创建工作簿和工作表，写入数据，然后保存。

读取Excel文件：
    使用pandas库的read_excel函数读取整个Excel文件或特定工作表。
    使用openpyxl库打开Excel文件，遍历工作表的每一行，读取数据。
    使用xlrd库打开Excel文件，遍历工作表的每一行，读取数据。
    使用pyexcel库的get_array函数快速读取Excel文件。