installation
docx
Is a non-standard library, you need to use pip in the command line (terminal) to install
pip install python-docx
Be sure to note that it is python-docx
both when installing and when actually calling docx
!
Pre-knowledge
Word can generally be structured into three parts:
- Documentation
Document
- paragraph
Paragraph
- Text block
Run
That is Document - Paragraph - Run tertiary structure, which is the most common situation. One of the 文字块Run
most difficult to understand and cannot be completed. As shown in the figure, the short sentence between the two symbols is a block of text.
Under normal circumstances, this can be understood, but if there are multiple different styles in this short sentence, it will be divided into multiple text blocks . Take the first yellow circle in the figure as an example. If you add some details to this short sentence At this time, there are 4 text blocks , and sometimes there are tables in a Word document. At this time, a new document structure will be generated . The structure at this time is very similar to Excel, which can be regarded as a Document - Table - Row/Column - Cell
four-level structure.
Word reading
1. Open Word
from docx import Document
path = ...
wordfile = Document(path)
2. Get paragraph
A word file consists of one or more paragraph
paragraphs
paragraphs = wordfile.paragraphs
print(paragraphs)
3. Get paragraph text content
By .text
acquiring text
for paragraph in wordfile.paragraphs:
print(paragraph.text)
4. Get the text content of the text block
A paragraph consists of one or more run text blocks
for paragraph in wordfile.paragraphs:
for run in paragraph.runs:
print(run.text)
5. Traverse the table
The traversal of the classic three-level structure completed by the above operation is very similar to the traversal table
# 按行遍历
for table in wordfile.tables:
for row in table.rows:
for cell in row.cells:
print(cell.text)
# 按列遍历
for table in wordfile.tables:
for column in table.columns:
for cell in column.cells:
print(cell.text)
Write Word
1. Create Word
As long as the path is not specified, the default is to create a new Word file
from docx import Document
wordfile = Document()
2. Save the file
Remember to save the modification and creation of the document
wordfile.save(...)
... 放需要保存的路径
3. Add title
wordfile.add_heading(…, level=…)
4. Add paragraph
wordfile.add_paragraph(...)
wordfile = Document()
wordfile.add_heading('一级标题', level=1)
wordfile.add_paragraph('新的段落')
5. Add text block
wordfile.add_run(...)
6. Add pagination
wordfile.add_page_break(...)
7. Add pictures
wordfile.add_picture(..., width=…, height=…)
Set style
1. Text font settings
2. Other text style settings
from docx import Document
from docx.shared import RGBColor, Pt
wordfile = Document(file)
for paragraph in wordfile.paragraphs:
for run in paragraph.runs:
run.font.bold = True # 加粗
run.font.italic = True # 斜体
run.font.underline = True # 下划线
run.font.strike = True # 删除线
run.font.shadow = True # 阴影
run.font.size = Pt(20) # 字号
run.font.color.rgb = RGBColor(255, 0, 0) # 字体颜色
3. Paragraph style settings
The default alignment is left alignment, you can modify it yourself
summary
The above is how to use the docx module in Python to implement common operations in Word. As long as you understand what types of operations can be performed in Python, you can think of using Python when you encounter tedious tasks later.