python | Convert pdf files to images, this trick is enough

1. Background

In some cases, it is necessary to convert PDF pages to images, such as PNG or JPEG format.
Python's open source library pdfplumber provides a method to convert PDF files into images.
If you have not installed and used the pdfplumber library before, please refer to previous articles for the installation and basic use of pdfplumber:
How to install, import and basic use of pdfplumber

2. Detailed explanation of converting pdf files into images using pdfplumber

Methods provided by pdfplumber:
.to_image(resolution=150), the result returns an instance of the PageImage class.
Parameter: resolution={integer}, set resolution, can be omitted, default resolution: 72.
The usage method is as follows, just change the path in the code snippet below to the path of the pdf file that needs to be converted.

import pdfplumber

# 需要转换为图片的pdf文件路径
file_path = r'E:\pytest\pdfp\test\test.pdf'
with pdfplumber.open(file_path) as pdf:
    # pdf.pages默认为pdf全部页
    # 可通过切片的方式选择需要转换的1页或几页,如前2页:pdf.pages[:2]
    for i, page in enumerate(pdf.pages[:2]):
        im = page.to_image(resolution=150)
        # 保存
        im.save(r'E:\pytest\pdfp\test\page-{}.png'.format(i + 1))
        print('----分割线,第%d页----' % (int(i) + 1))

The above code converts the first 2 pages of test.pdf into images. The running results are as follows:
Insert image description here
Some basic methods of the PageImage class

method Definition
im.reset() Reset, clearing everything that has been drawn.
im.copy() Copy the image to a new "PageImage" instance.
im.save() Syntax: im.save(path_or_fileobject, format=""); path_or_fileobject parameter: Pass in the file path and file name to save the image, which cannot be omitted and must end with path + file name + .png or .jpeg. format parameter: image format, can be omitted.

3. Common problems and solutions

When you directly run the above code and use the to_image() method in pdfplumber to convert the PDF file into an image, an error may occur. There are two main error situations:

(1) ImageMagick is not installed locally

Key error message: You probably had not installed ImageMagick library.
That is, the reason for the error: ImageMagick is not installed locally. ImageMagick is a free software for creating, editing, and compositing images.
Solution: Directly click the link in the error message, download the ImageMagick installation package, and install it.
Insert image description here
The version/installation package file downloaded in this article is: ImageMagick-7.1.1-15-Q16-HDRI-x64-dll.exe
ImageMagick Simple Installation Tutorial
Double-click the downloaded ImageMagick installation file, click [next] according to the prompts, and install it directly.
You can also modify the default installation path and configuration items during the installation process as needed. If you find it troublesome, you can follow the prompts to [next] to install. The default installation is sufficient (already includes creating desktop shortcuts and configuring system environment variables— this point is very important).
Part of the key installation process is as shown below:
Insert image description here
Insert image description here
Insert image description hereInsert image description hereInsert image description here
After the ImageMagick installation is completed, the error problem is solved.

(2) Problem that gswin64c.exe cannot be found

Key error message: FailedToExecuteCommand “gswin64c.exe”.
Reason for the error: When installing ImageMagick, gswin64c.exe was not installed.
Solution: Find a gs.exe file to replace.
Insert image description here
The detailed steps are as follows:
Step 1: Download relevant files.
For the download link, please refer to:
https://mirrors.tuna.tsinghua.edu.cn/gnu/octave/windows/
The download version of this article is:
octave-8.2.0-w64.zip. (It is recommended to download directly from the above mirror link web page, which is faster, the file is larger, and the network disk and other resources will be very slow to download.)
Corresponding to the ImageMagick version installed in this article, as shown below:
Insert image description here
Step 2: Unzip octave-8.2.0 -w64.zip file, and rename the file named [gs.exe]
in the \octave-8.2.0-w64\mingw64\bin path
to [gswin64c.exe].
Insert image description here
Insert image description here
Step 3: Configure environment variables.
Open the computer [Settings]-[System]-[About]-[Advanced System Settings]-[Environment Variables] on the far right-find the [Path] variable in the system variables-click [Edit]-then [New] and change \octave Add the full path of -8.2.0-w64\mingw64\bin (such as this article: E:\soft\ImageMagick\octave-8.2.0-w64\mingw64\bin), and finally click Confirm.
Insert image description here
After completing the above three steps, the error problem will be solved. If the same error is still reported, restart the python environment (for example: close PyCharm and reopen it).

4. Summary

The method of using pdfplumber to convert pdf to images is very simple and the code is not complicated.
What may be more troublesome is that it has certain requirements for the local environment and requires installation and configuration.

-end-

Guess you like

Origin blog.csdn.net/LHJCSDNYL/article/details/132520958