Climb there will be verification code page appears.
The most fortunate situation is encountered in this case the picture below
this verification code type most most likely to be identified.
Online description pytesser to achieve, it is Google open source OCR project.
I tried for a long time, I do not know why that is not. Interested parties can own research, the following interface
will need to download two files, themselves Baidu, anyway, many years ago, older posts are this way.
I say the following method is successful. This approach is also to take advantage of Tesseract-OCR
My environment 64 Win10 + py2.7
step1. Installing PIL library
as relates to image recognition, it must be installed PIL (Python Imaging Library) to perform image processing
proposed installation by the following method
pip install pillow
About pillow For instructions, see link
step2. install Tesseract-OCR
go here to download https://github.com/UB-Mannheim/tesseract/wiki
I downloaded the tesseract-ocr-w64-setup- v5.0.0-alpha.20191030.exe
installed by default on the line
Step3. Configuration
3-1 appending the path you just installed tesseract-ocr variable path in the computer environment
, if normal, the following is displayed
C:\>tesseract -v
tesseract v5.0.0-alpha.20191030
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5
3-2. TESSDATA_PREFIX append an environment variable in the environment variable, the value is your Tesseract-OCR installation directory \ tessdata. In fact, this is the directory where the file eng.traineddata
3-3. Modify installation of python Lib \ site-packages \ pytesseract in pytesseract.py
this file has the following line
tesseract_cmd = 'tesseract'
change into
tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
That is the directory where tesseract.exe.
Note : Please here with a backslash, or be wrong. Do not believe you can own try.
step4. Use
us to try the following verification code.
Code
#coding:utf-8
from PIL import Image,ImageEnhance
import pytesseract
im=Image.open("yzm.aspx.jfif")
image = im.convert('L')#图像加强,二值化
im2 =ImageEnhance.Contrast(image)#对比度增强
im3 = im2.enhance(2.0)
text = pytesseract.image_to_string(im3)
print text
yzm.aspx.jfif verification picture is saved in the file name on the site.
Run Results
PFRD
Supplementary
then another verification code test site at (irregular gap character, the character is inclined)
the results of 6823, can be identified.