Hand in hand | 20 lines of Python code teach you to convert PDF to Word in batches

In daily work or study, we often encounter such helplessness:

"Xiao Ren, please code the file in this PDF and send it to me"

Damn, unlucky, 2M PDF can't finish at 12 points!

80793d3a7d51abae3ad586fc1d85b2faed8a4a1a

Many times when you are studying, you find that many documents are in PDF format, but PDF format is not conducive to learning and use, so you need to convert PDF to Word file, but maybe you have downloaded a lot of software from the Internet, but you can only convert the first five pages (such as WPS). etc.), or you need to charge, is there any free conversion software?

So, we have brought you a free, simple and fast method to teach you how to use Python to batch process PDF files, get the content you want, and save them in word form.

Before implementing the PDF to Word function, we need a python writing and running environment, and install the relevant dependency packages. For the python environment, we recommend using PyCharm. In the local computer environment, anaconda provides very convenient installation and deployment.

The dependency packages required for the PDF to Word function are as follows:

  • PDFParser (Document Parser)
  • PDFDocument (document object)
  • PDFResourceManager (resource manager)
  • PDFPageInterpreter (interpreter)
  • PDFPageAggregator (aggregator)
  • LAParams (parameter analyzer)

initial preparation work

Description: This article uses the latest version 3.6 of python under Windows7

1. Install the pdfminer3k module

After installing anaconda, it can be installed directly through pip

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324396446&siteId=291194637