Crack the code - use the tesseract

1. Install tesseract


  OCR, i.e., Optical Character Recognition, OCR , refers to the process by scanning the character, and then by its shape will be translated into electronic text. For graphics codes, they are some irregular characters, which is indeed slight twisting converted content obtained by the character.

tesseract Download: https: //digi.bib.uni-mannheim.de/tesseract/

The download page, you can see there are a variety .exe file download list, where you can choose to download version 3.0.

 

 

Where the file name with the dev version for developers, without the dev is stable version, you can choose to download without the dev version, for example, can choose to download tesseract-ocr-setup-3.05.02.exe.

After the download is complete double click, then the page will appear as shown below.

 

At this point you can check Additional language data (download) option to install the OCR language support packages, so we can OCR recognizes multiple languages. Of course, if you just do not recognize English and digital download, we needed according to their own.

Then all the way click on the Next button.

Next, in order to use the function tesseract python code, using pip installation pytesseract:

pip install pytesseract

2, configure the environment variables and verification

  Add the tesseract installation directory to the path environment variable

D:\Program Files\Tesseract-OCR

 

  dos input tesseract If the following message appears, the installation was successful

  

3, testing and certification

  Just find a CAPTCHA image (picture a little, that time may be necessary to modify the next picture size, or not recognize)

  3.1 use the command line to verify

    First CAPTCHA image on the D drive, enter the command: tesseract rand.jpg result

    

    这里我们调用了tesseract命令,其中第一个参数为图片名称,第二个参数result 为结果保存的目标文件名称。

    会发现D盘多了一个result.txt文件,这里面就是识别后的文本信息了

   

    

    验证成功!

  3.2 python代码验证

     这里需要用到上面安装的pytesseract库

from PIL import Image
import pytesseract
 
text = pytesseract.image_to_string(Image.open(r'D:\rand.jpg'))
print(text)

  我们首先利用Image.open读取了图片文件,然后调用了pytesseract的image_to_string()方法,再将其识别结果输出。

  

  完美!!!!开心,哈哈哈哈

Guess you like

Origin www.cnblogs.com/gcgc/p/11325581.html