Encyclopedia of common methods for reading and writing EXCEL files in Python

There are many ways to read and write excel in python, and different modules have slightly different ways of reading and writing. Here I will mainly introduce a few commonly used methods.

  • Use xlrd and xlwt to read and write excel;
  • Use openpyxl to read and write excel;
  • Use pandas to read and write excel;
    reference:

Many people learn python and don't know where to start.
Many people learn python and after mastering the basic grammar, they don't know where to find cases to get started.
Many people who have done case studies do not know how to learn more advanced knowledge.
For these three types of people, I will provide you with a good learning platform, free to receive video tutorials, e-books, and course source code! ??¤
QQ group: 232030553

https://www.python-excel.org/
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html#pandas.read_excel
https: //www.jianshu.com/p/19219542bf23

2 | 0 data preparation

In order to facilitate the demonstration, I created a new data.xls and data.xlsx file here, the content of the first worksheet sheet1 area "A1:E5" is as follows, used to test the code for reading and writing excel:

 

3 | 0 xlrd and xlwt

xlrd is a library for reading data and formatting information from Excel files in .xls format xlwt is a library for
writing data and formatting information to older Excel files (for example: .xls).

3 | 1 example

pip install xlrd
pip install xlwt

 


We start to read the contents of the file

import xlrd
import os

file_path = os.path.dirname(os.path.abspath(__file__))
base_path = os.path.join(file_path, 'data.xlsx')
book = xlrd.open_workbook(base_path)
sheet1 = book.sheets()[0]
nrows = sheet1.nrows
print('表格总行数', nrows)
ncols = sheet1.ncols
print('表格总列数', ncols)
row3_values = sheet1.row_values(2)
print('第3行值', row3_values)
col3_values = sheet1.col_values(2)
print('第3列值', col3_values)
cell_3_3 = sheet1.cell(2, 2).value
print('第3行第3列的单元格的值:', cell_3_3)

 


Next, let's write. There are too many operations that can be performed. I only list the commonly used operations here.

import xlwt
import datetime
# 创建一个workbook 设置编码
workbook = xlwt.Workbook(encoding='utf-8')
# 创建一个worksheet
worksheet = workbook.add_sheet('Worksheet')
# 写入excel参数对应 行, 列, 值
worksheet.write(0, 0, label='测试')
# 设置单元格宽度
worksheet.col(0).width = 3333

# 设置单元格高度
tall_style = xlwt.easyxf('font:height 520;')
worksheet.row(0).set_style(tall_style)

# 设置对齐方式
alignment = xlwt.Alignment()  # Create Alignment
# May be: HORZ_GENERAL, HORZ_LEFT, HORZ_CENTER, HORZ_RIGHT, HORZ_FILLED, HORZ_JUSTIFIED, HORZ_CENTER_ACROSS_SEL, HORZ_DISTRIBUTED
alignment.horz = xlwt.Alignment.HORZ_CENTER
# May be: VERT_TOP, VERT_CENTER, VERT_BOTTOM, VERT_JUSTIFIED, VERT_DISTRIBUTED
alignment.vert = xlwt.Alignment.VERT_CENTER
style = xlwt.XFStyle()  # Create Style
style.alignment = alignment  # Add Alignment to Style
worksheet.write(2, 0, '居中', style)

# 写入带颜色背景的数据
pattern = xlwt.Pattern()  # Create the Pattern
# May be: NO_PATTERN, SOLID_PATTERN, or 0x00 through 0x12
pattern.pattern = xlwt.Pattern.SOLID_PATTERN
pattern.pattern_fore_colour = 5  # May be: 8 through 63. 0 = Black, 1 = White, 2 = Red, 3 = Green, 4 = Blue, 5 = Yellow, 6 = Magenta, 7 = Cyan, 16 = Maroon, 17 = Dark Green, 18 = Dark Blue, 19 = Dark Yellow , almost brown), 20 = Dark Magenta, 21 = Teal, 22 = Light Gray, 23 = Dark Gray, the list goes on...
style = xlwt.XFStyle()  # Create the Pattern
style.pattern = pattern  # Add Pattern to Style
worksheet.write(0, 1, '颜色', style)

# 写入日期
style = xlwt.XFStyle()
# Other options: D-MMM-YY, D-MMM, MMM-YY, h:mm, h:mm:ss, h:mm, h:mm:ss, M/D/YY h:mm, mm:ss, [h]:mm:ss, mm:ss.0
style.num_format_str = 'M/D/YY'
worksheet.write(0, 2, datetime.datetime.now(), style)

# 写入公式
worksheet.write(0, 3, 5)  # Outputs 5
worksheet.write(0, 4, 2)  # Outputs 2
# Should output "10" (A1[5] * A2[2])
worksheet.write(1, 3, xlwt.Formula('D1*E1'))
# Should output "7" (A1[5] + A2[2])
worksheet.write(1, 4, xlwt.Formula('SUM(D1,E1)'))

# 写入超链接
worksheet.write(1, 0, xlwt.Formula('HYPERLINK("http://www.baidu.com";"百度一下")'))
# 保存
workbook.save('Excel_test.xls')

It should be noted that it is best to execute through the command line in the current path, otherwise the file cannot be generated.

 

 

4|0openpyxl

openpyxl is a Python library for reading/writing Excel 2010 xlsx/xlsm/xltx/xltm files.
Installation package

pip install openpyx

After installation, you can start reading data

import openpyxl
import os
file_path = os.path.dirname(os.path.abspath(__file__))
base_path = os.path.join(file_path, 'data.xlsx')
workbook = openpyxl.load_workbook(base_path)
worksheet = workbook.get_sheet_by_name('Sheet1')
row3=[item.value for item in list(worksheet.rows)[2]]
print('第3行值',row3)
col3=[item.value for item in list(worksheet.columns)[2]]
print('第3行值',col3)
cell_2_3=worksheet.cell(row=2,column=3).value
print('第2行第3列值',cell_2_3)
max_row=worksheet.max_row
print('最大行',max_row)

 


Now let's start writing data

import zipfile  
# 创建文件句柄  
file = zipfile.ZipFile("测试.zip", 'r')  
# 提取压缩文件中的内容,注意密码必须是bytes格式,path表示提取到哪  
file.extractall(path='.', pwd='123'.encode('utf-8')) 

 

 

5 | 0 pandas

pandas supports xls, xlsx, xlsm, xlsb, odf, ods and odt file extensions to read from the local file system or URL. Support the option of reading a single worksheet or a list of worksheets.
The first is still the installation package

pip install pandas

语法:
pd.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False,dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, **kwds)

  • io, the storage path of Excel
  • sheet_name, the name of the worksheet to be read
  • header, which row to use as column name
  • names, customize the final column name
  • index_col, the column used as an index
  • usecols, which columns need to be read
  • squeeze, when the data contains only one column
  • converters, mandatory column data type
  • skiprows, skip specific rows
  • nrows, the number of rows to be read
  • skipfooter, skip the last n lines
import pandas as pd 
import os

file_path = os.path.dirname(os.path.abspath(__file__))
base_path = os.path.join(file_path, 'data.xlsx')
df = pd.read_excel(base_path)
print(df)

 


写入数据
语法:
DataFrame.to_excel(excel_writer, sheet_name='Sheet1', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
参数说明:

  • excel_writer: file path or existing ExcelWriter
  • sheet_name: The name of the worksheet that will contain the data file
  • na_rep: missing data representation
  • float_format: Format a string of floating point numbers. For example, float_format = "%. 2f" has a format of 0.1234 to 0.12.
  • columns: columns
  • header: write out the column name. If a list of strings is given, it is assumed to be an alias for the column name.
  • index: write row name (index)
  • index_label: If required, the column label of the index column. If not specified, and the header and index are true, the index name is used. If the DataFrame uses multiple indexes, a sequence should be given.
  • startrow: The cell row in the upper left corner dumps the data frame.
  • startcol: Dump the data frame in the upper left cell column.
  • engine: Write the engine "openpyxl" or "xlsxwriter" to be used. You can also set it through the options io.excel.xlsx.writer, io.excel.xls.writer and io.excel.xlsm.writer.
  • merge_cells: Write multiple index and hierarchy rows into merged cells.
  • encoding: Encode the generated excel file. Only necessary for xlwt, other writers support unicode.
  • inf_rep: Represents infinity.
  • verbose: Display more information in the error log.
  • freeze_panes: Specify the bottom row and the right column to be frozen
from pandas import DataFrame

data = {'name': ['张三', '李四', '王五'],'age': [11, 12, 13],'sex': ['男', '女', '男']}

df = DataFrame(data)

df.to_excel('file.xlsx')

 

Guess you like

Origin blog.csdn.net/Python_sn/article/details/111553321