The python-docx package handles common operation commands for Word files!

There are specific packages for processing Excel and Python, such as openpyxl; there is also a professional processing library for us to use when processing Word — python-docx (docx for short), which can perform basic operations on Microsoft Word (.docx) files

This article first introduces the basic grammar commonly used in Python-docx. Before understanding the grammar, you need to understand the word components corresponding to each command of python-docx. As shown in the figure below, Document refers to a word document, paragraph corresponds to paragraph, and run corresponds to For each field in a sentence, when making style adjustments, the general operation objects are all field by field.

Snipaste_2020-10-03_09-06-01.jpg

1. Install Python-docx

Installation can be carried out by pip tool, enter the command line pip install Python-docxyou can, see the following page represents has been successfully installed.

Snipaste_2020-09-26_08-41-23.jpg

2. Create or open Document

Python-docx uses the docx command when importing the package, which is similar to Opencv's Python version import method; when creating and opening files, use the Document() command. Here are a few points to pay attention to:

  • 1. The Document()command is to create a blank document based on the default "template", and then you can edit the document;
  • 2. The Document(path)command means to open a locally existing docx file, and path means that if the storage directory does not exist, the program will report an error;

In the following code, a blank docx is created and assigned to the document

from docx import Document

document = Document()

3. Add a paragraph

Paragraph is the main component of the body of the docx document. How to add a paragraph to the created Document? There are two ways

1. Insert after the document

This method is relatively common and simple, the command is as follows

paragraph = document.add_paragraph('Lorem ipsum dolor sit amet.')

In the method, the created paragraph reference points to the paragraph, indicating the position of the cursor. Some subsequent operations can use the paragraph reference variable as a positioning operation

2. Insert in front of the designated place

Document editing normal sequence is edited in the end, but sometimes less mistakes when editing may enter a word or text, then you use in front of the specified position of insertion,

prior_paragraph = paragraph.insert_paragraph_before('Lorem ipsum')

This command is commonly used in the revised document of the scene rather than creating an edit

3. Add title

In docx, the text will be divided into several parts with the first, second and third level headings to make the text more primary and secondary; Python-docx has corresponding built-in functions for us to use. The headings in the built-in functions are divided into main headings and sub headings

In the function method of creating a title, there is a parameter level that can be modified. If it is not set, it will default to the main title (leve = 0);

document.add_heading('The REAL meaning of the universe')

The subtitles are divided into 1-9 9 levels, just modify the parameter level

document.add_heading('The role of dolphins', level=2)

4. Add page break

Text editing in Word, want to separate a new page when writing text, you need to add a forced page break , the following command

document.add_page_break()

It should be noted here that after adding a page break, the edited paragraph style attributes on the new page are separated from the previous page paragraph attributes

5. Join the form

Create a 2*2 table in the document

table = document.add_table(rows=2, cols=2)

Each cell in the table can be text edited and filled with color; for a specific table, it can be positioned by row and column index

cell = table.cell(0, 1)

Assign its text content

cell.text = 'parrot, possibly dead'

It is too troublesome to modify one cell by one. You can select the specified column at a time and modify the cell data one by one

row = table.rows[1]
row.cells[0].text = 'Foo bar to you.'
row.cells[1].text = 'And a hearty foo bar to you too sir!'

table.rows[index]Returns the index of the specified row index, in accordance with .rows, and .colsa table showing all rows or columns are iterative, and therefore can be accessed by each cell for cyclic

for row in table.rows:
    for cell in row.cells:
        print(cell.text)

Since .rowsand .colsit is iterative, through the len()command line to get the number of columns

row_count = len(table.rows)
col_count = len(table.columns)

In addition to the above operations, you can also gradually add row and column commands to the table

row = table.add_row()
col = table.add_col()

The above mentioned create tables, modified cells, creating a new row / column, row / column iteration , west to be summed with a simple example, the completion code is as follows:

  • 1. Item creates a 3*3 tuple data;
  • 2. Create a table in word, one row and three columns;
  • 3. The header of the created table is set to Qty, SKU, and Description in turn;
  • 4. Create the elements in the item in the way of table 3 row by row;
# get table data -------------
items = (
    (7, '1024', 'Plush kittens'),
    (3, '2042', 'Furbees'),
    (1, '1288', 'French Poodle Collars, Deluxe'),
)

# add table ------------------
table = document.add_table(1, 3)

# populate header row --------
heading_cells = table.rows[0].cells
heading_cells[0].text = 'Qty'
heading_cells[1].text = 'SKU'
heading_cells[2].text = 'Description'

# add a data row for each item
for item in items:
    cells = table.add_row().cells
    cells[0].text = str(item.qty)
    cells[1].text = item.sku
    cells[2].text = item.desc

In addition, you can also modify the style of the table, the table style in the word document can be set here (query method, put the mouse on the thumbnail of the style), but it should be noted that the space in the style name needs to be removed here

table.style = 'LightShading-Accent1'

6. Add pictures

Adding pictures in python-docx is completed in the following form

document.add_picture('image-filename.png')

The above added is the local file path, in addition to file-like object ,

This method is very convenient for reading pictures from the database or the network.

Modify picture size

The python-docx added image indicates the native size by default. When a normal image is added, there will be a blank space of 4.167 inches on one side of the same image, which is about half of the paper width; when obtaining the desired image size, you can specify the width or Set the height to a more convenient unit

from docx.shared import Inches

document.add_picture('image-filename.png', width=Inches(1.0))

7, apply paragraph style

There are two ways to set paragraph style, one can be set when creating

document.add_paragraph('Lorem ipsum dolor sit amet.', style='ListBullet')

One way is to set it up after creating it

paragraph = document.add_paragraph('Lorem ipsum dolor sit amet.')
paragraph.style = 'List Bullet'

The python-docx style and word are in correspondence. Some of the words in word can be set here. The method of obtaining the style name is the same as the previous setting table method; note that when diyi is used when it is created, the original name needs to be removed when setting it . As shown in the first part of the code above

8. Apply fonts to be blackened and italicized

Before italicizing and blackening the font, you need to understand what is done in a paragraph. In short, there are two parts:

  • 1. A paragraph has all block-level formatting , such as tabs, line height, tabs, etc.;
  • 2, Character-Level Formatting , such as bold, italic, the application is runan object, all of the content must be in a paragraph run, and contains only a,

Run an object contains both a .boldand .italicattribute allows you to set its value

paragraph = document.add_paragraph('Lorem ipsum ')
run = paragraph.add_run('dolor')
run.bold = True
paragraph.add_run(' sit amet.')

The last text format created in the above code looks like: Lorem ipsum dolor sit amet.

Note that when you set bold or italic, can .add_run()order directly on the right side

paragraph.add_run('dolor').bold = True

# is equivalent to:

run = paragraph.add_run('dolor')
run.bold = True

# except you don't have a reference to `run` afterward

9. Apply character style

You can also define character styles ( character styles ), add a new line of run objects when defining; for example

paragraph = document.add_paragraph('Normal text, ')
paragraph.add_run('text with emphasis.', 'Emphasis')

Create a text above: Normal text, text with emphasis them. text with emphasis.Some applications Emphasis(强调)in character format

The above code can also be changed to;

paragraph = document.add_paragraph('Normal text, ')
run = paragraph.add_run('text with emphasis.')
run.style = 'Emphasis'

As with the paragraph style, and the style name Word UIin the same style in Word manager who can find!

Snipaste_2020-10-03_09-01-18.jpg

Guess you like

Origin blog.csdn.net/weixin_42512684/article/details/109264171
Recommended