Zero-based processing using Python to read and write Excel spreadsheet

Forwarding: https://blog.csdn.net/Cloudox_/article/details/53812213

lead

Due to the need to resolve the matter of handling large quantities of Excel, with its manual might as well write a simple code to handle, generally I chose the feeling or Python easiest to operate.

Installation Library

Python environment

First, of course with the environment, but the election Python is an important reason is that Mac is built within the Python environment, no additional configuration environment, save a sum of work, if you are using a Windows system, you would also need to configure about the Python environment, my Mac version of Python 2.7.

Third-party libraries

Python is not supported by its own direct operations Excel, but the power of Python in the fact that a large number of easy to use third-party libraries, where we use Excel to read and write Excel library of xlrd of xlwt library to operate.

Installation on third-party libraries is very simple, first of all, go to a special Web site to download the source Python library download two libraries:
* Download xlrd
* Download xlwt

Note For starters, the easiest way is to install the source installation, no need to toss a third-party library manager, click on the source code to download two libraries:

You see him later also describes the type of thing is the source.

Once downloaded decompress good mac to obtain a folder, you can see there are a setup.py file:

Certainly not here simply double-click installation, py type indicates that it is a Python code file, just double-click to open the file to see the code. We want to use the terminal, enter the command number into the files in the current folder where, for example, I put the file on the "download", then the practice is:

$ cd Downloads/
$ cd xlwt-1.1.2
$ sudo python setup.py install
  
  
  • 1
  • 2
  • 3

Cd here means that access to the folder, sudo mean with administrator privileges to install, without the use of words will tell you no rights, Enter will ask you to enter your computer password, enter the carriage return, python is executed command python code files, install is installed.

Then brush brush'll see a bunch of words in the past, and finally tell you finished, that is, the installation is complete.

xlrd is the same installation.

write the code

Read and write Excel third-party libraries are installed, you can start writing code.

Hello.py we create a file in a folder, and then open it with sublime like document editor, start writing code. (PS: Python in the beginning of the # denotes a comment)

Reading Excel

# -*- coding: utf-8 -*- 
import  xdrlib ,sys
import xlrd

#打开excel文件
def open_excel(file= 'test.xlsx'):
    try:
        data = xlrd.open_workbook(file)
        return data
    except Exception,e:
        print str(e)

#根据名称获取Excel表格中的数据   参数:file:Excel文件路径     colnameindex:表头列名所在行的索引  ,by_name:Sheet1名称
def excel_table_byname(file= 'test.xlsx', colnameindex=0, by_name=u'Sheet1'):
    data = open_excel(file) #打开excel文件
    table = data.sheet_by_name(by_name) #根据sheet名字来获取excel中的sheet
    nrows = table.nrows #行数 
    colnames = table.row_values(colnameindex) #某一行数据 
    list =[] #装读取结果的序列
    for rownum in range(0, nrows): #遍历每一行的内容
         row = table.row_values(rownum) #根据行号获取行
         if row: #如果行存在
             app = [] #一行的内容
             for i in range(len(colnames)): #一列列地读取行的内容
                app.append(row[i])
             list.append(app) #装载数据
    return list

#主函数
def main():
   tables = excel_table_byname()
   for row in tables:
       print row

if __name__=="__main__":
    main()
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36

This code I have a lot of comments, and talk about a few places to note, first of all the very beginning we set utp8 coding, then we must remember to import xlrd package in order to use its functions to read excel. Inside the main () is the main function, python will run this function that calls the function to read the rest of the data. This code implements a data read line by line excel file test.xlsx in Sheet1 table and out of print.

Excel, reads as follows:

There are two lines.

To run this code, you need to use the command-line terminal, first cd into the folder where the file code, code, and Excel files should be placed in this folder. Then use python hello.py command to run this code file:

These are the Python read and print out the contents, u represents the unicode encoding is used, you can see is consistent with Excel.

Creating Excel

Use xlwt library we can create an Excel:

# -*- coding: utf-8 -*- 
import xlwt

def testXlwt(file = 'new.xls'):
    book = xlwt.Workbook() #创建一个Excel
    sheet1 = book.add_sheet('hello') #在其中创建一个名为hello的sheet
    sheet1.write(0,0,'cloudox') #往sheet里第一行第一列写一个数据
    sheet1.write(1,0,'ox') #往sheet里第二行第一列写一个数据
    book.save(file) #创建保存文件

#主函数
def main():
   testXlwt()

if __name__=="__main__":
    main()
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

This code is simpler, also remember to import the library at the beginning.

We have created a code excel, add in a sheet, write two data, in accordance with our last name to save the file.

According to the same method to run the code above, the terminal will not be content to print, but we went to a new folder will be seen excel file named new.xls, open the can see:

According to our method of writing the data, while the sheet name is hello.

It is worth noting that there is such a word in the description xlwt library:

Library to create spreadsheet files compatible with MS Excel 97/2000/XP/2003 XLS files, on any platform, with Python 2.6, 2.6, 3.3+

也就是说,它只能创建 xls 的文件格式,不能创建现在的 xlsx 格式,其实有点老了,如果你把文件名写了 xlsx 格式,将会无法打开。

处理Excel内容

其实单独的读和写只是基本功,我们最终是想要处理Excel中的内容的。

这里我们假设一个使用场景,我们希望将Excel中所有第一列和第二列相同的行数据筛选出来保存到一个新的Excel中去。

那么我们的流程是:

  1. 打开目标Excel
  2. 读取内容
  3. 读取每一行的同时筛选第一列和第二列相等的行保留下来
  4. 创建一个新Excel
  5. 将筛选出来的内容写进去
  6. 保存新Excel

那么我们看代码:

# -*- coding: utf-8 -*- 
import  xdrlib ,sys
import xlrd
import xlwt

#打开excel文件
def open_excel(file= 'test.xlsx'):
    try:
        data = xlrd.open_workbook(file)
        return data
    except Exception,e:
        print str(e)

#根据索引获取Excel表格中的数据   参数:file:Excel文件路径     colnameindex:表头列名所在行的索引  ,by_index:表的索引
def excel_table_byindex(file= 'test.xlsx',colnameindex=0,by_index=0):
    data = open_excel(file) #打开excel文件
    table = data.sheets()[by_index] #根据sheet序号来获取excel中的sheet
    nrows = table.nrows #行数
    ncols = table.ncols #列数
    colnames =  table.row_values(colnameindex) #某一行数据 
    list =[] #装读取结果的序列
    for rownum in range(0,nrows): #遍历每一行的内容

         row = table.row_values(rownum) #根据行号获取行
         if row: #如果行存在
             app = [] #一行的内容
             for i in range(len(colnames)): #一列列地读取行的内容
                app.append(row[i])
             if app[0] == app[1] : #如果这一行的第一个和第二个数据相同才将其装载到最终的list中
                list.append(app)
    testXlwt('new.xls', list) #调用写函数,讲list内容写到一个新文件中
    return list

#将list中的内容写入一个新的file文件
def testXlwt(file = 'new.xls', list = []):
    book = xlwt.Workbook() #创建一个Excel
    sheet1 = book.add_sheet('hello') #在其中创建一个名为hello的sheet
    i = 0 #行序号
    for app in list : #遍历list每一行
        j = 0 #列序号
        for x in app : #遍历该行中的每个内容(也就是每一列的)
            sheet1.write(i, j, x) #在新sheet中的第i行第j列写入读取到的x值
            j = j+1 #列号递增
        i = i+1 #行号递增
    # sheet1.write(0,0,'cloudox') #往sheet里第一行第一列写一个数据
    # sheet1.write(1,0,'ox') #往sheet里第二行第一列写一个数据
    book.save(file) #创建保存文件

#主函数
def main():
   tables = excel_table_byindex()
   for row in tables:
       print row

if __name__=="__main__":
    main()
  
  
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29
  • 30
  • 31
  • 32
  • 33
  • 34
  • 35
  • 36
  • 37
  • 38
  • 39
  • 40
  • 41
  • 42
  • 43
  • 44
  • 45
  • 46
  • 47
  • 48
  • 49
  • 50
  • 51
  • 52
  • 53
  • 54
  • 55
  • 56

这次我们开头要导入xlrd和xlwt两个库,因为既要读也要写。

代码内容基本与上面两个差不多,有一点点加深,在读取的时候我们判断了第一列和第二列数据相同的行才加到list中去。在写的时候我们用了两个for循环来对新excel中的一个个单元格写数据,使用了i和j两个变量来记录位置。此外在获取sheet的时候,与上面的不同,这里是通过sheet的序号(这里是0)来获取的,上面的是通过sheet名称来获取。

我们要处理的Excel中的内容是这样的:

按道理我们筛选后只应该保留第一行的内容,运行完后我们得到了一个新的Excel文件,里面的内容如下:

可以看到和预期是相符的。

这里只是简单的例子,两个库的操作还有很多,能够进行的处理也有很多,如果要处理大量数据,可能还要考虑内存,分批次来处理,总之,本文只是一个入门,尽量追求零基础也能学着使用来解放劳动力,更多的用法,就看自己琢磨了。

可以下载我的示例工程:https://github.com/Cloudox/PYReadWriteExcelDemo


版权所有:http://blog.csdn.net/cloudox_

Guess you like

Origin blog.csdn.net/qq_26369907/article/details/89438894