python-docx library can be used to create and edit Microsoft Word (.docx) files.
The official document: https://python-docx.readthedocs.io/en/latest/index.html
NOTE:
doc is Microsoft's proprietary file format, docx is the version after Microsoft Office2007, based on Office Open XML standard compressed file format than
doc files take up less space. It is the docx format file is essentially a ZIP file, so in fact can also .docx files directly into .zip, after decompression, which the
word / document.xml contains most of the contents of Word documents, image files are saved in the word / media inside.
python-docx does not support .doc file, an indirect solution is first converted to .doc .docx code inside.
First, the installation package
pip3 install python-docx
Second, create a word document
The following is a slight modification to the example described in the official place individual basis, coupled with the use and function instructions
from docx Import the Document from docx.shared Import Inches Document = the Document () # add a title, and set the level, range: 0 to 9, by default. 1 document.add_heading ( ' the Document the Title ' , 0) # Add paragraph, text may contain tabs (\ T), linefeed (\ n) or a carriage return character (\ R & lt), etc. P = document.add_paragraph ( ' A Plain paragraph HAVING some ' ) # append text following paragraph, and set the style p. add_run ( ' Bold ' ) the .bold = True p.add_run ( ' and some ' ) p.add_run ( 'Italic. ' ) .italic = True document.add_heading ( ' the Heading, Level. 1 ' , Level =. 1 ) document.add_paragraph ( ' Intense quote ' , style = ' Intense Stock- ET Net ' ) # Add item list (in front of a small dot) document.add_paragraph ( ' First in unordered The Item List ' , style = ' List Bullet ' ) document.add_paragraph ( ' SECOND in unordered The Item List ' , style = ' List Bullet ') # Add a list of items (front Digital) document.add_paragraph('first item in ordered list', style='List Number') document.add_paragraph('second item in ordered list', style='List Number') #添加图片 document.add_picture('monty-truth.png', width=Inches(1.25)) records = ( (3, '101', 'Spam'), (7, '422', ' Eggs ' ), ( 4, ' 631 ' , ' Spam, from spam, eggs, and from spam ' ) ) # add a table: row three # table style Optional parameters: # Normal the Table # the Table Grid # Light Shading, Light Shading Shading Accent 1 to Accent. 6 Light # Light List, Light List List Accent 1 to Accent. 6 Light # Light the Grid, the Grid Light Accent 1 Accent to the Grid. 6 Light # many other omitted ... Table document.add_table = (rows = . 1, cols. 3 =, = style ' Light Shading Accent 2 ' ) #Obtaining a list of cells in the first row hdr_cells = table.rows [0] .Cells # next three lines set the cell in the first row of the three above text value hdr_cells [0] = .text ' Qty ' hdr_cells [ . 1]. = text ' Id ' hdr_cells [ 2] = .text ' Asc ' for the qty, ID, desc in Records: # table row is added, and returns a list of cells in the row is located row_cells = table.add_row () cells. row_cells [0] .text = STR (the qty) row_cells [ . 1] = .text ID row_cells [ 2] = .text desc document.add_page_break () #Save .docx document Document.Save ( ' demo.docx ' )
demo.docx content creation are as follows:
Third, read word document
from docx Import the Document DOC = the Document ( ' demo.docx ' ) # of each segment content for para in doc.paragraphs: Print (para.text) # of each segment number, content for I in Range (len (doc.paragraphs) ): Print (STR (I), doc.paragraphs [I] .text) # table TBS = doc.tables for TB in TBS: # line for row in tb.rows: # column for Cellin row.cells: Print (cell.text) # may also use the following method '' ' text =' ' for P in cell.paragraphs: text + = p.text Print (text) ' ''
operation result:
Document Title A plain paragraph having some bold and some italic. Heading, level 1 Intense quote first item in unordered list second item in unordered list first item in ordered list second item in ordered list 0 Document Title 1 A plain paragraph having some bold and some italic. 2 Heading, level 1 3 Intense quote 4 first item in unordered list 5 second item in unordered list 6 first item in ordered list 7 second item in ordered list 8 9 Qty Id Desc 3 101 Spam 7 422 Eggs 4 631 Spam, spam, eggs, and spam [Finished in 0.2s]