Automation will word into pdf, pdf and then into the picture!

reference:

https://blog.csdn.net/ynyn2013/article/details/49120731

https://www.jianshu.com/p/f57cc64b9f5e

 

First, the doc into pdf

1, install dependence

pip install pywin32

 

2, a direct call interfaces win32com open file, save as pdf. 17 SaveAs parameters representative of village pdf, finished close the file, close the word.

 1     def doc2pdf(self):
 2         try:
 3             w = Dispatch("Word.Application")
 4             doc = w.Documents.Open(self.docPath, ReadOnly=1)
 5             doc.SaveAs(self.pdfPath, 17)
 6         except:
 7             traceback.print_exc()
 8         finally:
 9             doc.Close()
10             w.Quit()
11         self.checkFile(self.pdfPath, 'pdf')

The following correspondence table file format

wdFormatDocument = 0
wdFormatDocument97 = 0
wdFormatDocumentDefault = 16
wdFormatDOSText = 4
wdFormatDOSTextLineBreaks = 5
wdFormatEncodedText = 7
wdFormatFilteredHTML = 10
wdFormatFlatXML = 19
wdFormatFlatXMLMacroEnabled = 20
wdFormatFlatXMLTemplate = 21
wdFormatFlatXMLTemplateMacroEnabled = 22
wdFormatHTML = 8
wdFormatPDF = 17
wdFormatRTF = 6
wdFormatTemplate = 1
wdFormatTemplate97 = 1
wdFormatText = 2
wdFormatTextLineBreaks = 3
wdFormatUnicodeText = 7
wdFormatWebArchive = 9
wdFormatXML = 11
wdFormatXMLDocument = 12
wdFormatXMLDocumentMacroEnabled = 13
wdFormatXMLTemplate = 14
wdFormatXMLTemplateMacroEnabled = 15
wdFormatXPS = 18

  

 

Second, the picture will be converted to pdf

1, install dependence

1.1、pip isntall pdf2image

1.2, Windows installation configuration poppler
Windows users must install to Windows poppler ( http://blog.alivate.com.au/poppler-windows/ ) , then bin / folder to the PATH (Start> Input env> editing system environment variables> Environment Variables .. > system variables> Path)
After installing poppler take effect you need to restart the system.
 
2. pdf into the picture
 1     def pdf2image(self):
 2         # 建立图片文件夹
 3         self.imgFold = os.path.join(self.fileFold, self.fileName)
 4         if not os.path.exists(self.imgFold):
 5             os.mkdir(self.imgFold)
 6 
 7         # 转存图片
 8         pages = convert_from_path(self.pdfPath)
 9         for i, page in enumerate(pages):
10             imgPath = os.path.join(self.imgFold, str(i)+'.jpg')
11             page.save(imgPath, 'JPEG')
12         self.checkFile(imgPath, 'last img')

 

Third, directly into word pictures

Method: binding 1,2

code show as below:

 1 import os
 2 import traceback
 3 from win32com.client import Dispatch
 4 from pdf2image import convert_from_path
 5 
 6 class Word2Pdf2Img():
 7     def __init__(self, docPath):
 8         # 初始化路径
 9         self.docPath = docPath
10         self.fileName = os.path.basename(self.docPath).split('.')[0]
11         self.fileFold = os.path.dirname(self.docPath)
12         self.pdfPath = os.path.join(self.fileFold, self.fileName + '.pdf')
13 
14     @staticmethod
15     def checkFile(filePath, fileType=''):
16         if os.path.isfile(filePath):
17             print ('file {} existed!'.format(fileType))
18         else:
19             print ('file {} not existed!'.format(fileType))
20 
21     def doc2pdf(self):
22         try:
23             w = Dispatch("Word.Application")
24             doc = w.Documents.Open(self.docPath, ReadOnly=1)
25             doc.SaveAs(self.pdfPath, 17)
26         except:
27             traceback.print_exc()
28         finally:
29             doc.Close()
30             w.Quit()
31         self.checkFile(self.pdfPath, 'pdf')
32 
33     def pdf2image(self):
34         # 建立图片文件夹
35         self.imgFold = os.path.join(self.fileFold, self.fileName)
36         if not os.path.exists(self.imgFold):
37             os.mkdir(self.imgFold)
38 
39         # 转存图片
40         pages = convert_from_path(self.pdfPath)
41         for i, page in enumerate(pages):
42             imgPath = os.path.join(self.imgFold, str(i)+'.jpg')
43             page.save(imgPath, 'JPEG')
44         self.checkFile(imgPath, 'last img')
45         
46     def doc2image(self):
47         self.doc2pdf()
48         self.pdf2image()

 

 

Guess you like

Origin www.cnblogs.com/Fosen/p/11835737.html