In Python, there is a python-docx
library called , which provides rich functions to easily create, modify and read Word documents.
This article walks through python-docx
the use of the library in detail and provides some examples to demonstrate its capabilities. For a better understanding, we will divide the discussion into the following aspects:
- Install
python-docx
- Create and save Word documents
- Modify an existing document
- Manipulate paragraphs and text
- Operation form
- Operation picture
- Other common operations
Without further ado, let's get started!
1. Installationpython-docx
To use python-docx
the library, you first need to install it. Run the following command in a terminal:
pip install python-docx
Once the installation is complete, we can start using it.
2. Create and save a Word document
We can use python-docx
the library to create new Word documents. Here is a simple example:
from docx import Document
# 创建新文档
doc = Document()
# 添加标题
doc.add_heading('Python-docx 示例', level=1)
# 添加段落
doc.add_paragraph('这是一个示例文档。')
# 保存文档
doc.save('示例文档.docx')
In this example, we first import Document
the class, which is the main class for creating and modifying Word documents. We then created a new document object doc
and add_heading
added a title using the method . Next, we add_paragraph
added a paragraph using the method. Finally, we use save
the method to save the document as 示例文档.docx
.
3. Modify existing documents
In addition to creating new documents, python-docx
it also allows us to modify existing documents. The following example shows how to open an existing document and modify its contents:
from docx import Document
# 打开现有文档
doc = Document('示例文档.docx')
# 修改第一个段落的内容
doc.paragraphs[0].text = '这是修改后的内容。'
# 保存文档
doc.save('示例文档.docx')
In this example, we use Document
the class to open an 示例文档.docx
existing document named . We then change the content of the first paragraph by modifying the properties paragraphs
of the first element in the list text
. Finally, we save
save the modified document using the method.
4. Manipulating paragraphs and text
python-docx
Provides a range of methods to manipulate paragraphs and text. The following examples demonstrate some commonly used methods:
from docx import Document
# 创建新文档
doc = Document()
# 添加段落
p1 = doc.add_paragraph('这是第一个段落。')
p2 = doc.add_paragraph('这是第二个段落。')
# 修改段落样式
p1.style = 'Heading 1'
p2.style = 'Heading 2'
# 添加文本
p1.add_run('这是新增的文本。')
# 插入分页符
doc.add_page_break()
# 添加表格
table = doc.add_table(rows=3, cols=3)
for i in range(3):
for j in range(3):
table.cell(i, j).text = f'单元格{i+1}-{j+1}'
# 保存文档
doc.save('示例文档.docx')
In this example, we create a new document and add two paragraphs. We then use style
properties to style the first paragraph as "Heading 1" and the second paragraph as "Heading 2".
When we want to add text in a paragraph, we can use add_run
the method, which allows us to insert new text in the paragraph. In the example, we added a new piece of text to the first paragraph.
If you want to insert a page break in your document, you can use add_page_break
the method. In the example, we added a page break to the document.
To add a table to a document, you can use add_table
the method. In the example, we created a 3x3 table and filled the table cells using nested loops.
To summarize, we can use python-docx
classes Document
to create, modify and save Word documents. We can manipulate paragraphs and text, modify styles, add page breaks and tables.
5. Operation form
Tables are one of the common elements in Word documents. python-docx
Many methods are provided to manipulate the table. The following examples demonstrate some common table operations:
from docx import Document
# 打开现有文档
doc = Document('示例文档.docx')
# 获取第一个表格
table = doc.tables[0]
# 访问单元格内容
cell_text = table.cell(0, 0).text
print(f'第一个单元格的内容:{cell_text}')
# 遍历行和列
for row in table.rows:
for cell in row.cells:
print(cell.text)
# 添加新行
new_row = table.add_row().cells
new_row[0].text = '新行单元格1'
new_row[1].text = '新行单元格2'
new_row[2].text = '新行单元格3'
# 保存文档
doc.save('示例文档.docx')
In this example, we opened an 示例文档.docx
existing document named , and tables
got the first table through the properties. We then use cell
methods to access the contents of cells in the table, and we also show how to iterate through all rows and columns of the table, and how to add new rows and populate cell contents.
6. Manipulating images
In addition to text and tables, python-docx
it also supports adding pictures to Word documents. The following example demonstrates how to add an image to a document:
from docx import Document
# 创建新文档
doc = Document()
# 添加图片
doc.add_picture('image.jpg', width=docx.shared.Inches(3), height=docx.shared.Inches(2))
# 保存文档
doc.save('示例文档.docx')
In this example, we created a new document and add_picture
added a image.jpg
picture named using the method. We can use the width
and height
parameters to set the width and height of the image, here we use Inches
the function to set the width to 3 inches and the height to 2 inches.
7. Other common operations
In addition to the functions introduced above, python-docx
it also provides many other common operation methods. Here are some examples:
- Get all paragraphs in a document:
from docx import Document
# 打开现有文档
doc = Document('示例文档.docx')
# 遍历所有段落
for paragraph in doc.paragraphs:
print(paragraph.text)
- Delete a paragraph in a document:
from docx import Document
# 打开现有文档
doc = Document('示例文档.docx')
# 删除第一个段落
doc._body[0].getparent().remove(doc._body[0])
- Modify the properties of the document:
from docx import Document
# 打开现有文档
doc = Document('示例文档.docx')
# 修改标题
doc.core_properties.title = '新标题'
# 保存文档
doc.save('示例文档.docx')
- Insert hyperlink:
from docx import Document
from docx.shared import Pt
from docx.oxml.ns import nsdecls
from docx.oxml import parse_xml
# 创建新文档
doc = Document()
# 添加段落
p = doc.add_paragraph()
# 添加超链接
run = p.add_run()
hyperlink = run.add_hyperlink("https://www.example.com", "这是一个链接")
# 设置超链接样式
hyperlink.style = "Hyperlink"
r = run._r
r.insert(1, parse_xml('<w:rPr><w:rStyle w:val="Hyperlink"/></w:rPr>'))
# 设置超链接字体样式
pr = run._element.get_or_add_pPr()
hyperlink_rpr = parse_xml('<w:rPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:rFonts w:asciiTheme="majorEastAsia" w:cstheme="majorEastAsia"/><w:b/><w:sz w:val="14"/><w:szCs w:val="14"/><w:u w:val="single"/></w:rPr>')
pr.append(hyperlink_rpr)
# 保存文档
doc.save('示例文档.docx')
In this example, we first created a new document and added a paragraph. We then use add_hyperlink
the method to add a hyperlink in the paragraph to "https://www.example.com" and read "This is a link". By setting the style and font style, we can customize the appearance of the hyperlink.
- Set the page layout and style:
from docx import Document
from docx.shared import Inches
# 创建新文档
doc = Document()
# 设置页面布局
section = doc.sections[0]
section.page_width = Inches(8.5)
section.page_height = Inches(11)
# 设置页面边距
section.left_margin = Inches(1)
section.right_margin = Inches(1)
section.top_margin = Inches(1)
section.bottom_margin = Inches(1)
# 保存文档
doc.save('示例文档.docx')
In this example, we create a new document and get the first section. Through settings page_width
and page_height
properties, we can adjust the width and height of the page. At the same time, by setting left_margin
, right_margin
, top_margin
and bottom_margin
properties, we can adjust the page margins.
This is only python-docx
a small part of the functionality of the library, which also provides many other operations, such as inserting headers and footers, adjusting font styles, inserting comments, and so on. You can learn more details through the official documentation: https://python-docx.readthedocs.io/
python-docx
Hope this article helps you understand and use the library!