In daily work or study, we often encounter such helplessness:
"Xiao Ren, please code the file in this PDF and send it to me"
Damn, unlucky, 2M PDF can't finish at 12 points!
Many times when you are studying, you find that many documents are in PDF format, but PDF format is not conducive to learning and use, so you need to convert PDF to Word file, but maybe you have downloaded a lot of software from the Internet, but you can only convert the first five pages (such as WPS). etc.), or you need to charge, is there any free conversion software?
So, we have brought you a free, simple and fast method to teach you how to use Python to batch process PDF files, get the content you want, and save them in word form.
Before implementing the PDF to Word function, we need a python writing and running environment, and install the relevant dependency packages. For the python environment, we recommend using PyCharm. In the local computer environment, anaconda provides very convenient installation and deployment.
The dependency packages required for the PDF to Word function are as follows:
- PDFParser (Document Parser)
- PDFDocument (document object)
- PDFResourceManager (resource manager)
- PDFPageInterpreter (interpreter)
- PDFPageAggregator (aggregator)
- LAParams (parameter analyzer)
initial preparation work
Description: This article uses the latest version 3.6 of python under Windows7
1. Install the pdfminer3k module
After installing anaconda, it can be installed directly through pip