Ubuntu环境下把word文档转成pdf,把pdf文件转成jpg

环境搭建

  使用语言 python3

  安装imagemagick(pdf转jpg是内部需要调用到此工具)

    apt-get install imagemagick

  安装libreoffice(此工具用于将word文档转化成pdf文件)

    apt-get install libreoffice

  安装python wand,PIL库

    pip install wand

    pip install PIL

  

PDF转JPG

先转png,再转jpg是为了避免出现黑色,透明等背景,造成转换出来的图片与pdf文件显示不一样

 1 from PIL import Image as Image2
 2 from wand.image import Image
 3 from wand.color import Color
 4 
 5 def convert_pdf_to_jpg(filename):
 6     end_length = len(filename.split('.')[-1]) + 1
 7     title = filename[0:-end_length]
 8     title = title.split('/')[-1]
 9 
10     #resolution为分辨率,background为背景颜色
11     with Image(filename=filename, resolution=150, background=Color('White')) as img :
12 
13         #页数
14         length = len(img.sequence)
15 
16         #如果页数超过1页,生成的文件名会依次加上页码数
17         with img.convert('png') as converted:
18             path = 'static/local_images/%s.png' % title
19             converted.save(filename=path)
20     image_list = []
21     if length == 1:
22         path = 'static/local_images/%s.png' % title
23         image_list.append(path)
24     else:
25         for i in range(0, length):
27             path = 'static/local_images/%s-%d.png' % (title, i)
28             image_list.append(path)
29     jpg_list = []
30     for img in image_list:
31         image = Image2.open(img)
32         x,y = image.size
33         background = Image2.new('RGBA', image.size, (255,255,255))
34 
35         try:
36             background.paste(image, (0, 0, x, y), image)
37             image = background.convert('RGB')
38         except:
39             image = image.convert('RGBA')
40             background.paste(image, (0, 0, x, y), image)
41             image = background.convert('RGB')
42 
43 
44         title = img.split('.')[0]
45         name = title + '.jpg'
46         image.save(name)
47         os.remove(img)
48         name = "%s/%s" %(static_host, name)
49         jpg_list.append(name)
50 
51     return jpg_list

word文档转PDF

python没有直接把word转换成pdf文档的库,只能先安装libreoffice工具,然后利用os库系统调用libreoffice工具

 1 import os
 2 
 3 def convert_doc_to_pdf(filename):
 4     end_length = len(filename.split('.')[-1]) + 1
 5     name = filename[0:-end_length]
 6 
 7     cmd = 'libreoffice  --convert-to pdf  %s' % filename
 8     os.system(cmd)
 9     name = name.split('/')[-1] + '.pdf'
10     return name

  

猜你喜欢

转载自www.cnblogs.com/cityking/p/pdf_word_to_jpg.html