python with python-docx word document reader

python-docx library can be used to create and edit Microsoft Word (.docx) files.
The official document: https://python-docx.readthedocs.io/en/latest/index.html

NOTE:
doc is Microsoft's proprietary file format, docx is the version after Microsoft Office2007, based on Office Open XML standard compressed file format than
doc files take up less space. It is the docx format file is essentially a ZIP file, so in fact can also .docx files directly into .zip, after decompression, which the
word / document.xml contains most of the contents of Word documents, image files are saved in the word / media inside.
python-docx does not support .doc file, an indirect solution is first converted to .doc .docx code inside.


First, the installation package

pip3 install python-docx

Second, create a word document

The following is a slight modification to the example described in the official place individual basis, coupled with the use and function instructions

from docx Import the Document
 from docx.shared Import Inches 

Document = the Document () 

# add a title, and set the level, range: 0 to 9, by default. 1 
document.add_heading ( ' the Document the Title ' , 0) 

# Add paragraph, text may contain tabs (\ T), linefeed (\ n) or a carriage return character (\ R & lt), etc. 
P = document.add_paragraph ( ' A Plain paragraph HAVING some ' )
 # append text following paragraph, and set the style 
p. add_run ( ' Bold ' ) the .bold = True 
p.add_run ( ' and some ' ) 
p.add_run ( 'Italic. ' ) .italic = True 

document.add_heading ( ' the Heading, Level. 1 ' , Level =. 1 ) 
document.add_paragraph ( ' Intense quote ' , style = ' Intense Stock- ET Net ' ) 

# Add item list (in front of a small dot) 
document.add_paragraph (
     ' First in unordered The Item List ' , style = ' List Bullet ' 
) 
document.add_paragraph ( ' SECOND in unordered The Item List ' , style = ' List Bullet ') 

# Add a list of items (front Digital)
document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number')

#添加图片
document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', ' Eggs ' ), 
    ( 4, ' 631 ' , ' Spam, from spam, eggs, and from spam ' ) 
) 

# add a table: row three 
# table style Optional parameters: 
# Normal the Table 
# the Table Grid 
# Light Shading, Light Shading Shading Accent 1 to Accent. 6 Light 
# Light List, Light List List Accent 1 to Accent. 6 Light 
# Light the Grid, the Grid Light Accent 1 Accent to the Grid. 6 Light 
# many other omitted ... 
Table document.add_table = (rows = . 1, cols. 3 =, = style ' Light Shading Accent 2 ' )
 #Obtaining a list of cells in the first row 
hdr_cells = table.rows [0] .Cells
 # next three lines set the cell in the first row of the three above text value 
hdr_cells [0] = .text ' Qty ' 
hdr_cells [ . 1]. = text ' Id ' 
hdr_cells [ 2] = .text ' Asc ' 
for the qty, ID, desc in Records:
     # table row is added, and returns a list of cells in the row is located 
    row_cells = table.add_row () cells. 
    row_cells [0] .text = STR (the qty) 
    row_cells [ . 1] = .text ID 
    row_cells [ 2] = .text desc 

document.add_page_break () 

#Save .docx document 
Document.Save ( ' demo.docx ' )

demo.docx content creation are as follows:

Third, read word document

from docx Import the Document 

DOC = the Document ( ' demo.docx ' ) 

# of each segment content 
for para in doc.paragraphs:
     Print (para.text) 

# of each segment number, content 
for I in Range (len (doc.paragraphs) ):
     Print (STR (I), doc.paragraphs [I] .text) 

# table 
TBS = doc.tables
 for TB in TBS:
     # line 
    for row in tb.rows:    
         # column     
        for Cellin row.cells:
             Print (cell.text)
             # may also use the following method 
            '' ' text =' ' 
            for P in cell.paragraphs: 
                text + = p.text 
            Print (text) ' ''

operation result:

Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
second item in unordered list
first item in ordered list
second item in ordered list



0 Document Title
1 A plain paragraph having some bold and some italic.
2 Heading, level 1
3 Intense quote
4 first item in unordered list
5 second item in unordered list
6 first item in ordered list
7 second item in ordered list
8 
9 

Qty
Id
Desc
3
101
Spam
7
422
Eggs
4
631
Spam, spam, eggs, and spam
[Finished in 0.2s]

 

Guess you like

Origin www.cnblogs.com/gdjlc/p/11407587.html