【PYTHON,WORD】1.利用python-docx 读取word文件

0.安装python-docx模块

windows:pip install python-docx
mac:pip3 install python-docx

1.word文档结构

Document: 文档
Paragraph:段落
Run:文字块

在这里插入图片描述
共有三段
2.提取文字
2.1提取段落实例,段数:

.paragraphs  获取一个列表,包含每个段落的实例
from docx import Document

doc = Document("0.docx")
print(doc.paragraphs)
print(len(doc.paragraphs))

结果:

[<docx.text.paragraph.Paragraph object at 0x000001F88E2F2E80>, <docx.text.paragraph.Paragraph object at 0x000001F88E2F2C88>, <docx.text.paragraph.Paragraph object at 0x000001F88E2F2EF0>]
3

结果说明有三段
2.2提取段落内容

from docx import Document

doc = Document("0.docx")
for paragraph in doc.paragraphs:
	print(paragraph.text)
以上便是excel与python结合的第二部分内容,后续将会持续更新excel,ppt,爬虫,人工智能等相关内容,敬请关注

2.3获取文字块run

excel与python结合的第二部分内容,后续将会持续更新excelppt爬虫,人工智能
一个格式为一个文字块run 上述句子有7个文字块run

from docx import Document

doc = Document("0.docx")
paragraph = doc.paragraphs[1]
runs = paragraph.runs
print(runs)
[<docx.text.run.Run object at 0x000001F88E2F2E10>, <docx.text.run.Run object at 0x000001F88E2F2C88>, <docx.text.run.Run object at 0x000001F88E2F2E80>, <docx.text.run.Run object at 0x000001F88E2F2DD8>, <docx.text.run.Run object at 0x000001F88E2F2EB8>, <docx.text.run.Run object at 0x000001F88E2F2F28>, <docx.text.run.Run object at 0x000001F88E2F2F60>]

paragraph.runs 获取一个列表,得到每个文字块的实例

2.4提取文字块的内容

from docx import Document

doc = Document("0.docx")
paragraph = doc.paragraphs[1]
runs = paragraph.runs
print(runs)
for run in runs:
	print(run.text)
excel与python结合的第二部分内容,
后续将会持续更新excel
,
ppt
,
爬虫
,人工智能

以上便是
word与python结合的第一部分内容,
后续将会持续更新excel,ppt,爬虫,人工智能
等相关内容,敬请关注

发布了28 篇原创文章 · 获赞 25 · 访问量 2051

猜你喜欢

转载自blog.csdn.net/AI_LINNGLONG/article/details/104342729