ubuntu下安装python-docx

python-docx是一个用于创建和更新Microsoft Word(.docx)文件的Python库。
github:https://github.com/python-openxml/python-docx
python-docx documentatioin:https://python-docx.readthedocs.io/en/latest/

wang@wang:~$ git clone https://github.com/python-openxml/python-docx.git

安装lxml:
lxml包依赖其他包,先安装依赖包:

wang@wang:~$ sudo  apt-get install libxml2-dev libxslt-dev python-dev

再安装lxml:

wang@wang:~$ sudo apt-get install python-lxml

如果直接执行下面的语句会报:error: Could not find suitable distribution for Requirement.parse(‘lxml>=2.3.2’)

wang@wang:~/python-docx$ python setup.py install
...
Installed /usr/local/lib/python2.7/dist-packages/python_docx-0.8.10-py2.7.egg
Processing dependencies for python-docx==0.8.10
Searching for lxml==3.5.0
Best match: lxml 3.5.0

可用如下方式查看lxml版本:

wang@wang:~$ python
Python 2.7.12 (default, Nov 12 2018, 14:36:49) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml import etree
>>> print etree.LXML_VERSION
(3, 5, 0, 0)

测试一下,用这个库写一个.docx文件:

wang@wang:~$ vim testpy.py

内容如下:

from docx import Document
from docx.shared import Inches

#打开一个基于默认“模板”的空白文档,几乎是您使用内置默认值在Word中启动新文档时获得的文档。
document = Document()

#添加标题,第一个参数为标题名,第二个为标题的级别
document.add_heading('Document Title', 0)

#添加段落
p = document.add_paragraph('A plain paragraph having some ') #段落内容
p.add_run('bold').bold = True #内容“blod”加粗
p.add_run(' and some ')
p.add_run('italic.').italic = True #内容“italic.”斜体

document.add_heading('Heading, level 1', level=1)
#应用段落样式
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)
#添加图片
document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)
#添加表格1行3列
table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells#第一行
hdr_cells[0].text = 'Qty'#第一行第一个cell的内容
hdr_cells[1].text = 'Id'#第一行第二个cell的内容
hdr_cells[2].text = 'Desc'#第一行第三个cell的内容
for qty, id, desc in records: #循环插入cell并把records的内容写入
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

#添加分页
document.add_page_break()
#如果这里再有内容,即使上面没有满一页,也会写到下一页
document.save('demo.docx')

同时home目录下放上一张图片,并命名为monty-truth.png:
在这里插入图片描述
然后执行:

wang@wang:~$ python testpy.py 

则会在home目录生成一个demo.docx文件,内容如下:
在这里插入图片描述
用这个库读一个.docx文件,就读上面生成的文件的一部分:

wang@wang:~$ vim read_testpy.py

内容如下:

# -*- coding:utf-8 -*-
import sys
import docx

path = sys.argv[1]

file = docx.Document(path)
for para in file.paragraphs:
	print(para.text)
wang@wang:~$ python read_testpy.py demo.docx 
Document Title
A plain paragraph having some bold and some italic.
Heading, level 1
Intense quote
first item in unordered list
first item in ordered list

参考文章:https://www.cnblogs.com/ontheway703/p/5266041.html

猜你喜欢

转载自blog.csdn.net/u010931295/article/details/100151340