Python's python-docx: operating office word documents

In Python, there is a python-docxlibrary called , which provides rich functions to easily create, modify and read Word documents.

This article walks through python-docxthe use of the library in detail and provides some examples to demonstrate its capabilities. For a better understanding, we will divide the discussion into the following aspects:

  1. Installpython-docx
  2. Create and save Word documents
  3. Modify an existing document
  4. Manipulate paragraphs and text
  5. Operation form
  6. Operation picture
  7. Other common operations

Without further ado, let's get started!

1. Installationpython-docx

To use python-docxthe library, you first need to install it. Run the following command in a terminal:

pip install python-docx

Once the installation is complete, we can start using it.

2. Create and save a Word document

We can use python-docxthe library to create new Word documents. Here is a simple example:

from docx import Document

# 创建新文档
doc = Document()

# 添加标题
doc.add_heading('Python-docx 示例', level=1)

# 添加段落
doc.add_paragraph('这是一个示例文档。')

# 保存文档
doc.save('示例文档.docx')

In this example, we first import Documentthe class, which is the main class for creating and modifying Word documents. We then created a new document object docand add_headingadded a title using the method . Next, we add_paragraphadded a paragraph using the method. Finally, we use savethe method to save the document as 示例文档.docx.

3. Modify existing documents

In addition to creating new documents, python-docxit also allows us to modify existing documents. The following example shows how to open an existing document and modify its contents:

from docx import Document

# 打开现有文档
doc = Document('示例文档.docx')

# 修改第一个段落的内容
doc.paragraphs[0].text = '这是修改后的内容。'

# 保存文档
doc.save('示例文档.docx')

In this example, we use Documentthe class to open an 示例文档.docxexisting document named . We then change the content of the first paragraph by modifying the properties paragraphsof the first element in the list text. Finally, we savesave the modified document using the method.

4. Manipulating paragraphs and text

python-docxProvides a range of methods to manipulate paragraphs and text. The following examples demonstrate some commonly used methods:

from docx import Document

# 创建新文档
doc = Document()

# 添加段落
p1 = doc.add_paragraph('这是第一个段落。')
p2 = doc.add_paragraph('这是第二个段落。')

# 修改段落样式
p1.style = 'Heading 1'
p2.style = 'Heading 2'

# 添加文本
p1.add_run('这是新增的文本。')

# 插入分页符
doc.add_page_break()

# 添加表格
table = doc.add_table(rows=3, cols=3)
for i in range(3):
    for j in range(3):
        table.cell(i, j).text = f'单元格{i+1}-{j+1}'

# 保存文档
doc.save('示例文档.docx')

In this example, we create a new document and add two paragraphs. We then use styleproperties to style the first paragraph as "Heading 1" and the second paragraph as "Heading 2".

When we want to add text in a paragraph, we can use add_runthe method, which allows us to insert new text in the paragraph. In the example, we added a new piece of text to the first paragraph.

If you want to insert a page break in your document, you can use add_page_breakthe method. In the example, we added a page break to the document.

To add a table to a document, you can use add_tablethe method. In the example, we created a 3x3 table and filled the table cells using nested loops.

To summarize, we can use python-docxclasses Documentto create, modify and save Word documents. We can manipulate paragraphs and text, modify styles, add page breaks and tables.

5. Operation form

Tables are one of the common elements in Word documents. python-docxMany methods are provided to manipulate the table. The following examples demonstrate some common table operations:

from docx import Document

# 打开现有文档
doc = Document('示例文档.docx')

# 获取第一个表格
table = doc.tables[0]

# 访问单元格内容
cell_text = table.cell(0, 0).text
print(f'第一个单元格的内容:{cell_text}')

# 遍历行和列
for row in table.rows:
    for cell in row.cells:
        print(cell.text)

# 添加新行
new_row = table.add_row().cells
new_row[0].text = '新行单元格1'
new_row[1].text = '新行单元格2'
new_row[2].text = '新行单元格3'

# 保存文档
doc.save('示例文档.docx')

In this example, we opened an 示例文档.docxexisting document named , and tablesgot the first table through the properties. We then use cellmethods to access the contents of cells in the table, and we also show how to iterate through all rows and columns of the table, and how to add new rows and populate cell contents.

6. Manipulating images

In addition to text and tables, python-docxit also supports adding pictures to Word documents. The following example demonstrates how to add an image to a document:

from docx import Document

# 创建新文档
doc = Document()

# 添加图片
doc.add_picture('image.jpg', width=docx.shared.Inches(3), height=docx.shared.Inches(2))

# 保存文档
doc.save('示例文档.docx')

In this example, we created a new document and add_pictureadded a image.jpgpicture named using the method. We can use the widthand heightparameters to set the width and height of the image, here we use Inchesthe function to set the width to 3 inches and the height to 2 inches.

7. Other common operations

In addition to the functions introduced above, python-docxit also provides many other common operation methods. Here are some examples:

  • Get all paragraphs in a document:
from docx import Document

# 打开现有文档
doc = Document('示例文档.docx')

# 遍历所有段落
for paragraph in doc.paragraphs:
    print(paragraph.text)
  • Delete a paragraph in a document:
from docx import Document

# 打开现有文档
doc = Document('示例文档.docx')

# 删除第一个段落
doc._body[0].getparent().remove(doc._body[0])
  • Modify the properties of the document:
from docx import Document

# 打开现有文档
doc = Document('示例文档.docx')

# 修改标题
doc.core_properties.title = '新标题'

# 保存文档
doc.save('示例文档.docx')
  • Insert hyperlink:
from docx import Document
from docx.shared import Pt
from docx.oxml.ns import nsdecls
from docx.oxml import parse_xml

# 创建新文档
doc = Document()

# 添加段落
p = doc.add_paragraph()

# 添加超链接
run = p.add_run()
hyperlink = run.add_hyperlink("https://www.example.com", "这是一个链接")

# 设置超链接样式
hyperlink.style = "Hyperlink"
r = run._r
r.insert(1, parse_xml('<w:rPr><w:rStyle w:val="Hyperlink"/></w:rPr>'))

# 设置超链接字体样式
pr = run._element.get_or_add_pPr()
hyperlink_rpr = parse_xml('<w:rPr xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:rFonts w:asciiTheme="majorEastAsia" w:cstheme="majorEastAsia"/><w:b/><w:sz w:val="14"/><w:szCs w:val="14"/><w:u w:val="single"/></w:rPr>')
pr.append(hyperlink_rpr)

# 保存文档
doc.save('示例文档.docx')

In this example, we first created a new document and added a paragraph. We then use add_hyperlinkthe method to add a hyperlink in the paragraph to "https://www.example.com" and read "This is a link". By setting the style and font style, we can customize the appearance of the hyperlink.

  • Set the page layout and style:
from docx import Document
from docx.shared import Inches

# 创建新文档
doc = Document()

# 设置页面布局
section = doc.sections[0]
section.page_width = Inches(8.5)
section.page_height = Inches(11)

# 设置页面边距
section.left_margin = Inches(1)
section.right_margin = Inches(1)
section.top_margin = Inches(1)
section.bottom_margin = Inches(1)

# 保存文档
doc.save('示例文档.docx')

In this example, we create a new document and get the first section. Through settings page_widthand page_heightproperties, we can adjust the width and height of the page. At the same time, by setting left_margin, right_margin, top_marginand bottom_marginproperties, we can adjust the page margins.

This is only python-docxa small part of the functionality of the library, which also provides many other operations, such as inserting headers and footers, adjusting font styles, inserting comments, and so on. You can learn more details through the official documentation: https://python-docx.readthedocs.io/

python-docxHope this article helps you understand and use the library!

Guess you like

Origin blog.csdn.net/naer_chongya/article/details/131429885