The most complete Tesseract-OCR engine installation steps! ! ! Suitable for dynamic crawler processing verification code!

  1. download

You need to manually install Tesseract-OCR. This is the download path of the installation package:
http://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-4.00.00dev.exe,
download it and put it as you like Under the path.

  1. Installation
    Then it is installed:
    Insert picture description hereInsert picture description hereInsert picture description hereselect the download package and
    Insert picture description herechoose the appropriate installation path. I personally recommend not to install it on the C drive and
    Insert picture description hereclick install until the end.
    Insert picture description hereThis error may be reported in the middle, which means that the installation is wrong, that is, there is a problem with the download of the two language packs checked before, but it does not affect my subsequent use.
    Insert picture description here

  2. Environment configuration
    Find your installation path: B:\Tesseract-OCR (mine is this),
    open advanced system settings, configure environment variables:
    user variables:Insert picture description here add the path to the
    Insert picture description heresystem variables:
    Insert picture description here write the variable name the same as me, the path is yourself of.
    Insert picture description hereIn addition, if you need more language packs, you can find them in this official
    Insert picture description here
    account : this is someone else’s, download it yourself if you need it! If you don't need it, you can skip it directly. After this thing is installed, it can recognize numbers and English verification codes, but Chinese seems not!
    Insert picture description here

  3. Install the packages needed by python!
    Install the pytesseract library:

	pip install pytesseract

Install the PIL library:

	pip install pillow

After installing it, go to the folder where you downloaded the package:
E:\Anaconda 2019.03\Lib\site-packages\pytesseract
(mine is this) Open this:
Insert picture description herehere, modify it to your own, the installation path and then follow a The .exe file is in that folder, just like me.
Insert picture description here

  1. Test: Enter the following code in Jupyter, if it can run correctly, the configuration is successful (Note: You need to put the "captcha.jpg" verification code image and the Jupyter file in the same folder)
import pytesseract
from PIL import Image
# 创建Image对象
image = Image.open('captcha.jpg')
# 将图片文件转化为字符串
text = pytesseract.image_to_string(image)
print(text)

Insert picture description here
The verification code inside can be extracted after successful operation! ! !

If you succeed, remember to give me a like! ! !

Guess you like

Origin blog.csdn.net/qq_46295527/article/details/105799380