Operation Excel using the Python openpyxl

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all" #全部行都能输出
import warnings
warnings.filterwarnings('ignore')

Excel Basic Concepts

  • Workbook: an Excel spreadsheet document, extension .xlsx
  • Worksheet: a workbook can contain up to 255 worksheets
  • Active table: the user is currently viewing, or close the last exit before Excel table
  • Column: A default from the beginning of the line: the default starting at 1
  • Cell: line intersecting grid for the cell

Installation openpyxl module

# pip install openpyxl
import openpyxl
pip show openpyxl #查看包的版本

Read Excel documents

Open the Excel document with the openpyxl module

wb = openpyxl.load_workbook(r"C:\\Users\\Administrator\\example.xlsx") # wb means workbook
type(wb)
import os
os.getcwd() #获取当前工作路径
# os.chdir() #更改当前工作路径

Get a worksheet from the workbook

wb.get_sheet_names()
sheet3 = wb.get_sheet_by_name('Sheet3') #get sheet by name
type(sheet3) #the type of sheet3 is a worksheet
sheet3.title #view the title of sheet
anotherSheet = wb.get_active_sheet() #get active sheet
anotherSheet.title #活动单元格的title

Obtaining cells from a table

import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet1 = wb.get_sheet_by_name("Sheet1")
sheet1['A1'] # Cell 提取单元格
sheet1['A1'].value #单元格的数据类型和内容
sheet1['A1'].row #单元格所在的行
sheet1['A1'].column #单元格所在的列
sheet1['A1'].coordinate #单元格所在的列
sheet1.cell(row=1,column=2) #提取第1行、第2列的单元格
sheet1.cell(row=1,column=2).value #提取第1行、第2列的单元格的值
for i in range(1,8,2):
    print(i,sheet1.cell(row=i,column=2).value)
import openpyxl
wb = openpyxl.load_workbook(r"C:\\Users\\Administrator\\example.xlsx")
sheet1 = wb.get_sheet_by_name('Sheet1')
sheet1.max_row
sheet1.max_column

Conversion between the columns of letters and numbers

import openpyxl
from openpyxl.utils import get_column_letter,column_index_from_string
get_column_letter(1)
get_column_letter(100)
column_index_from_string('A')
column_index_from_string('AA')

Get rows and columns from a table

import openpyxl
wb = openpyxl.load_workbook(r"C:\\Users\\Administrator\\example.xlsx")
sheet1 = wb.get_sheet_by_name('Sheet1')
tuple(sheet1['A1':'C3']) #每一行单独成tuple的一个元素
list(sheet1['A1':'C3']) #每一行单独成list的一个元素
for rowOfCellObjects in sheet1['A1':'C3']:  #打印区域
    for cellObj in rowOfCellObjects:
        print(cellObj.coordinate,cellObj.value)
    print('---End of Row---')
import openpyxl
wb = openpyxl.load_workbook(r"C:\\Users\\Administrator\\example.xlsx")
sheet1 = wb.get_active_sheet()
for cellObj in list(sheet1.columns)[1]: #打印单列
    print(cellObj.value)

summary

  • Import module openpyxl
  • Call openpyxl.load_workbook () function
  • Made Workbook object
  • Call get_active_sheet () or get_sheet_by_name () method workbook
  • Use the index or worksheet cell () method, Cell () method belt row and column parameters
  • Cell object made
  • Cell read object value / row / column / coordinate Properties

Project: read data from Excel

Read spreadsheet data

import openpyxl,pprint
print("Opening workbook ...")
wb = openpyxl.load_workbook(r"C:\\Users\\Administrator\\censuspopdata.xlsx") #Workbook对象
sheet = wb.get_sheet_by_name('Population by Census Tract') #Worksheet对象
countyData = {}
#ToDo: Fill in countyData with each county's population and tracts.
print("Reading row...")

Filling the data structure

for row in range(2,sheet.max_row+1):
    #Each row in the spreadsheet has data for one census tract.
    State = sheet['B'+str(row)].value
    County = sheet['C'+str(row)].value
    Pop = sheet['D'+str(row)].value
    # Make sure the key for this State exists.
    countyData.setdefault(State,{})
    #Make sure the key for this County in this state exists.
    countyData[State].setdefault(County,{'tracts':0,'pop':0})
    
    # Each row represents one census tractso increment by one.
    countyData[State][County]['tracts'] += 1
    #Increase the county pop by the pop in this census tract.
    countyData[State][County]['pop'] += int(Pop)

The program results are written to the file

# Open a new text file and write the contents of countyData to it.
print("Writing results")
resultFile = open('census2010.py','w')
resultFile.write('allData = '+pprint.pformat(countyData))
resultFile.close()
print("Done")
# 调用已经存储好的census2010.py文件查看结果
import census2010
anchoragePop = census2010.allData['AK']['Anchorage']['pop']
print("The 2010 population of Anchorage was " + str(anchoragePop))

Similar procedures idea (slightly)

Write Excel documents

Create and save Excel files

import openpyxl
wb = openpyxl.Workbook() #创建空对象
wb.get_sheet_names() #查看空对象的sheet
sheet = wb.get_active_sheet() #获取当前活动工作表
sheet.title
sheet.title = 'Spam Bacon Eggs Sheet' #修改当前工作表的title
wb.get_sheet_names() #查看已修改的活动工作表title
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
sheet = wb.get_active_sheet()
sheet.title = 'Spam Spam Spam'
wb.save('example_copy.xlsx') #保存修改工作表名的拷贝

Create and Delete Sheet

import openpyxl
wb = openpyxl.Workbook()
wb.get_sheet_names()
wb.create_sheet()
wb.get_sheet_names()
wb.create_sheet(index=0,title='First Sheet')
wb.get_sheet_names()
wb.create_sheet(index=2,title = 'Middle Sheet')
wb.get_sheet_names
wb.remove_sheet(wb.get_sheet_by_name('Middle Sheet'))
wb.remove_sheet(wb.get_sheet_by_name('Sheet1'))
wb.get_sheet_names()
['First Sheet', 'Sheet']

The cell values ​​are written

import openpyxl
wb = openpyxl.Workbook()
sheet = wb.get_sheet_by_name('Sheet')
sheet['A1'] = 'Hello World'
sheet['A1'].value

Project: update a spreadsheet

Establish a data structure with the updated information

import openpyxl
wb  = openpyxl.load_workbook('produceSales.xlsx')
sheet = wb.get_sheet_by_name('Sheet')
# the produce types and their updated prices
price_updates = {'Garlic':3.07,'Celery':1.19,'Lemon':1.27}
#ToDo:Loop through the rows and update the prices.

Check all lines, update incorrect price

for rowNum in range(2,sheet.max_row+1): #the first row is heading, skip it
    produceName = sheet.cell(row=rowNum,column=1).value
    if produceName in price_updates:
        sheet.cell(row=rowNum,column=2).value = price_updates[produceName]
wb.save('updatedProduceSales.xlsx')

Similar thought process (omitted)

Set the cell font style

from openpyxl.styles import Font
wb = openpyxl.Workbook()
sheet = wb.get_sheet_by_name('Sheet')
italic24Font = Font(size = 24, italic = True)
sheet['A1'].font = italic24Font
sheet['A1'] = 'Hello world!'
wb.save('styled.xlsx')

Font Object

import openpyxl
from openpyxl.styles import Font
wb = openpyxl.Workbook()
sheet = wb.get_sheet_by_name('Sheet')

fontobj1 = Font(name='Times New Roman',italic=True) #name 字体 size 字号 bold 是否加粗 italic 是否斜体
sheet['A1'].font = fontobj1
sheet['A1'] = 'Bold Times New Roman'

fontobj2 = Font(size=24,italic=True)
sheet['B3'].font = fontobj2
sheet['B3']= '23 pt Italic'

wb.save('styles.xlsx') #默认size 11 name Calibri

official

import openpyxl
wb = openpyxl.Workbook()
sheet = wb.get_active_sheet()
sheet['A1'] = 200
sheet['A2'] = 300
sheet['A3'] = '=SUM(A1:A2)'
wb.save('writeFormula.xlsx')
import openpyxl
wb = openpyxl.load_workbook('writeFormula.xlsx')
sheet = wb.get_active_sheet()
sheet['A3'].value
import openpyxl
wbDataonly = openpyxl.load_workbook('writeFormula.xlsx',data_only=True)
sheet1 = wbDataonly.get_active_sheet()
sheet1['A3'].value #此处需要手动打开一次.xlsx文件

Adjustment of rows and columns

Row height and column width adjustment

import openpyxl
wb = openpyxl.Workbook()
sheet = wb.active
sheet['A1'] = 'Tall row'
sheet['B2'] = 'Wide column'
sheet.row_dimensions[1].height = 70
sheet.column_dimensions['B'].width = 20
wb.save('dimensions.xlsx')

Merging and splitting cells

import openpyxl
wb = openpyxl.Workbook()
sheet = wb.active
sheet.merge_cells('A1:D3') #合并单元格
sheet['A1'] = 'Twelve cells merged together'
sheet.merge_cells('C5:D5')
sheet['C5'] = 'Two merged cells.'
wb.save('merged.xlsx')
import openpyxl 
wb = openpyxl.load_workbook('merged.xlsx') #拆分单元格
sheet = wb.active
sheet.unmerge_cells('A1:D3')
sheet.unmerge_cells('C5:D5')
wb.save('merged.xlsx')

Freeze Panes

import openpyxl
wb = openpyxl.load_workbook('produceSales.xlsx')
sheet = wb.active
sheet.freeze_panes = 'C2' 
# "A2" means freeze row1 'B1' means freeze columnA 'C2' means freeze row1 and columnA/columnB 'A1' or None means no freezed
wb.save('freezeExample.xlsx')

chart

import openpyxl 
wb = openpyxl.Workbook()
sheet = wb.active
for i in range(1,11): #create some data in column A
    sheet['A'+str(i)] = i

refObj = openpyxl.chart.Reference(sheet,min_row = 1,min_col = 1,max_row = 10,max_col = 1) #创建数据区域
seriesObj = openpyxl.chart.Series(refObj,title='First series')
chartObj = openpyxl.chart.BarChart()
chartObj.title = 'My chart'
chartObj.append(seriesObj)

sheet.add_chart(chartObj,'C5') #'C5'表示图表开始的位置
wb.save('sampleChart.xlsx')

Exercises and Summary

  • openpyxl.load_workbook () function does return?
    • Returns a Workbook object
  • What get_sheet_names () return the workbook?
    • Returns a list of all sheet names consisting of
  • How to get Worksheet object named "Sheet1" worksheet
    • openpyxl.get_sheet_by_name("Sheet1")
  • How to get Worksheet object is not the active worksheet work
    • wb.get_active_sheet() / wb.active
  • How to get the value of a cell "C5" in
    • sheet['C5'].value / sheet.cell(row=5,column=3).value
  • How the value in cell C5 is arranged to 'Hello'
    • sheet['C5'] = 'Hello'
  • How do I get an integer of cell rows and columns?
    • get_column_letter(int) int --> char
    • column_index_from_string char --> int
  • max_row / What max_column return, the return value is what type?
    • sheet1.max_row return line number as an integer range of cells
    • sheet1.max_column return the column number as an integer range of cells
  • If you want to get the column 'M' subscript, what you need to call function
    • column_index_from_string('M')
  • If you want to get the string name of the column 14, what you need to call function
    • get_column_letter(14)
  • How to get all Cell objects from A1 to F1 of tuples
    • tuple(sheet['A1':'F1])
  • How to save a workbook to file name example.xlsx?
    • wb.save('example.xlsx')
  • How to set up a formula in a cell?
    • sheet['B1'] = '==SUM(A1:B15)'
  • If you need to get results in cell formulas, rather than the formula itself, you must be what to do first?
    • Use data_only = True when reading
    • mannual open and save workbooks
  • How to set the height of the fifth row of 100
    • sheet.row_dimensions[5] = 100
  • How the width of column C is set to 70
    • sheet.column_dimensions['C'] = 70
  • Are some openpyxl2.1.4 not load functions from xlsx
    • Chart # version has been updated to the new version unknown
  • What is Freeze Panes?
    • sheet.freeze_panes = 'C2' and freezing the first line A, B column
  • Create a bar chart of the steps and methods:
    • Loading data openpyxl.load_workbook ()
    • Create a data source reference area of ​​the chart object #
    • Create a series object and reference object is added to the series sequence data object #
    • Create a chart object, and the object is added to the series chart objects (append method)
    • The chart object is added to the sheet (add_chart Method)

This paper finishing from "python programming Quick Start allows automation of tedious work."

Data Download: http://nostarch.com/automatestuff/

Guess you like

Origin www.cnblogs.com/evian-jeff/p/11401207.html