Python把数据从Word(.docx)中读出来写入到Excel(.xlsx)中

左侧Word的每一行是一段,是一些非结构化数据,目标是把它结构化表示成右侧的excel格式。


需要导入的包:

import docx
from docx import Document
from openpyxl import Workbook
from tools import *

新建用于写xlsx的对象

workbook = Workbook()
booksheet = workbook.active

读docx文档存入到xlsx里:

 
 
dir = '/Users/b/'
file = '南京亲近母语2017年书目.docx'
f = docx.Document(dir+file)
level = ''
#遍历文档里的段落
for para in f.paragraphs:
    bookname = ''
    auther = ''
    publiser = ''
    resource = '南京亲近母语2017年书目'
    text = para.text
    if len(text) == 0:
        continue

    text = key_filter(text)        #用于过滤数据
    textlist=text.split('    ')
    if len(textlist) == 1:
        level = textlist[0]
        print('level1',level)
        continue
    print('level2',level)
    while ' ' in textlist:
        textlist.remove('')
    list = []
    if is_bookname(textlist[0].strip()):
        bookname = re_filter(textlist[0].strip(),'[1-9]\d*.')
        print(bookname)
    else:
        continue
    list.append(bookname.strip())
    list.append(textlist[1].strip())
    list.append(publiser.strip())
    list.append(resource.strip())
    list.append(level.strip())
    booksheet.append(list)
workbook.save(file.split('.')[0]+'.xlsx')

上面是完整的,下面分开解释解释

读Word文档:

f = docx.Document(dir+file)
for para in f.paragraphs:
    text = para.text
    print(text)

新建excel文件并写入数据,以list的形式写入表中

from openpyxl import Workbook
workbook = Workbook()
booksheet = workbook.active
list = ['《大卫上学去》','[美]大卫·香农','','南京亲近母语2017年书目','一年级课程书目(图画书书目']
booksheet.append(list)  
workbook.save(file.split('.')[0]+'.xlsx')



猜你喜欢

转载自blog.csdn.net/u012135425/article/details/80258060