[Python • Image recognition] pytesseract quickly recognizes and extracts text in images

insert image description here

Tip: There are many pictures in this article, please pay attention to the traffic on the mobile phone.


foreword

Using python for image recognition, there are many ways to recognize and extract the text in the image, but if you want to do something simpler, you can use the tesseract recognition engine to achieve it, and one line of code can extract the image text.


1. Configuration environment

1. Install python dependencies

This program uses two python libraries, pytesseract and PIL, so install them first.

run the following command

pip install Pillow
pip install pytesseract 

If no error is reported in python, it means that the program is installed successfully.
insert image description here

2. Install the recognition engine

After installing the above two dependencies, the corresponding recognition engine is required. click to download

We directly use the latest version built on May 10.
insert image description here

Install the tesseract recognition engine(可跳过)

After the download is complete, open the program to install, first select the language, choose English here English, and then clickok

insert image description here
The next thing is next, click to I Agreeagree to the agreement,
insert image description here
insert image description here
install for all users, and then click next, as shown in the figure, and
insert image description here
then install the Chinese language pack 用来识别中文, you need to slide to the bottom, select Chinese, I have selected both horizontal simplified Chinese and vertical simplified Chinese , click next after completion,
insert image description here
insert image description here
select the installation path, it is recommended to install to other than the C drive, and then click next
insert image description here
here to install install,

insert image description here
Wait for the installation to complete
insert image description here
After the installation is complete, click next, and then click finishto complete the installation,
insert image description here
insert image description here

Verify that the installation was successful

Add an environment variable, which is the path of the folder you installed to, add it directly to the path,
insert image description here
and then run it on the command line tesseract -v. If it is the same as the figure below, it means that you have successfully installed it.
insert image description here

2. Use steps

1. Import library

from PIL import Image
import pytesseract

2. Extract image text

Encapsulate a line of code for reading pictures into a function,

def read_image(name):
    print(pytesseract.image_to_string(Image.open(name), lang='chi_sim'))

mainJust call it directly in the function ,

def main():
    read_image('1657158527412.jpg')

3. Operation effect

Take the following image as an example,
insert image description here

The operation effect is as follows,
insert image description here


Summarize

This article introduces the python call of tesseract, that is, the pytesseract library. There are some other contents that are not involved, but only involve the extraction of text from pictures. If you are interested in it, you can explore it in depth, and hope to discuss it with me. .

full code

from PIL import Image
import pytesseract


def read_image(name):
    print(pytesseract.image_to_string(Image.open(name), lang='chi_sim'))


def main():
    read_image('img.png')


if __name__ == '__main__':
    main()

Guess you like

Origin blog.csdn.net/weixin_47754149/article/details/125651707