Stop asking me how Python operates Word!

installation

docxIs a non-standard library, you need to use pip in the command line (terminal) to install

pip install python-docx

Be sure to note that it is python-docxboth when installing and when actually calling docx!

Pre-knowledge

imageWord can generally be structured into three parts:

  • DocumentationDocument
  • paragraphParagraph
  • Text blockRun

That is Document - Paragraph - Run tertiary structure, which is the most common situation. One of the 文字块Runmost difficult to understand and cannot be completed. As shown in the figure, the short sentence between the two symbols is a block of text.

Under normal circumstances, this can be understood, but if there are multiple different styles in this short sentence, it will be divided into multiple text blocks . Take the first yellow circle in the figure as an example. If you add some details to this short sentence imageAt this time, there are 4 text blocks , and sometimes there are tables in a Word document. At this time, a new document structure will be generated image. The structure at this time is very similar to Excel, which can be regarded as a Document - Table - Row/Column - Cellfour-level structure.

Word reading

1. Open Word

from docx import Document
path = ...
wordfile = Document(path)

2. Get paragraph

A word file consists of one or more paragraphparagraphs

paragraphs = wordfile.paragraphs 
print(paragraphs)

3. Get paragraph text content

By .textacquiring text

for paragraph in wordfile.paragraphs: 
    print(paragraph.text)

4. Get the text content of the text block

A paragraph consists of one or more run text blocks

for paragraph in wordfile.paragraphs: 
    for run in paragraph.runs: 
        print(run.text)

5. Traverse the table

The traversal of the classic three-level structure completed by the above operation is very similar to the traversal table

# 按行遍历
for table in wordfile.tables:
    for row in table.rows:
        for cell in row.cells:
            print(cell.text)
       
# 按列遍历     
for table in wordfile.tables:
    for column in table.columns:
        for cell in column.cells:
            print(cell.text)

Write Word

1. Create Word

As long as the path is not specified, the default is to create a new Word file

from docx import Document
wordfile = Document() 

2. Save the file

Remember to save the modification and creation of the document

wordfile.save(...)
... 放需要保存的路径

3. Add title

wordfile.add_heading(…, level=…)

4. Add paragraph

wordfile.add_paragraph(...)

wordfile = Document() 
wordfile.add_heading('一级标题', level=1) 
wordfile.add_paragraph('新的段落')

5. Add text block

wordfile.add_run(...)image

6. Add pagination

wordfile.add_page_break(...)

image

7. Add pictures

wordfile.add_picture(..., width=…, height=…)image

Set style

1. Text font settings

image

2. Other text style settings

from docx import Document
from docx.shared import RGBColor, Pt

wordfile = Document(file)
for paragraph in wordfile.paragraphs:
    for run in paragraph.runs:
        
        run.font.bold = True  # 加粗 
        run.font.italic = True # 斜体 
        run.font.underline = True # 下划线 
        run.font.strike = True # 删除线 
        run.font.shadow = True # 阴影 
        run.font.size = Pt(20# 字号 
        run.font.color.rgb = RGBColor(25500# 字体颜色

3. Paragraph style settings

The default alignment is left alignment, you can modify it yourselfimage

summary

The above is how to use the docx module in Python to implement common operations in Word. As long as you understand what types of operations can be performed in Python, you can think of using Python when you encounter tedious tasks later.


Guess you like

Origin blog.51cto.com/15064626/2598018