The identification codes of -01 - identification pattern verification code

The identification codes of 01 - identification pattern verification code

1. Preparations

  • 1. Download and install tesseract Download
  • After the download is complete, double-click the installer, you can check Additional language data (download) option to install the OCR language support packages, so we can OCR recognizes multiple languages
  • The tesseract configuration environment variable
  • Add the tesseract language packs into the environment variable, create a new system variables in the environment variable, the variable name is TESSDATA_PREFIX, tessdata is placed language pack folder, usually in the directory where you installed tesseract, that tesseract installation directory is tessdata parent directory, set the value for it to TESSDATA_PREFIX
  • pip install tesserocr file, pay attention to pip install tesserocr installation always fails on the window system, you need to tesserocr of .whl file on github download tesseract version we installed the corresponding installation,

2. Get the captcha

import os
import requests
from uuid import uuid4
from selenium import webdriver
browser = webdriver.Firefox()
browser.get('http://my.cnki.net/elibregister/commonRegister.aspx')
browser.implicitly_wait(2)
os.mkdir('picture')
for i in range(5):
    image =  browser.find_element_by_xpath('//*[@id="checkcode"]')
    image_url = image.get_attribute('src')
    image_content = requests.get(image_url).content
    image_path = os.path.join('picture', f'{uuid4()}.jpg')
    with open(image_path, 'wb') as f:
        f.write(image_content)
    image.click()
    browser.implicitly_wait(2)

 

3. Identify the test

Import tesserocr
 from the PIL Import Image 

image = Image.open ( ' Picture / 1.jpg ' ) 
Result = tesserocr.image_to_text (image)   # converts the object image to text 
Print (Result) 

Print (tesserocr.file_to_text ( ' Picture /. 1. JPG ' ))   # convert the file to text objects

 

4. Processing codes

  It is converted to gray level image and binary processing

= image.convert Image ( ' L ' ) # picture into a grayscale image 
image.show () 
Image = image.convert ( ' . 1 ' ) # the image to binarization processing 
image.show ()
We can also specify a threshold value binarization, the above method uses the default threshold 127, but we do not directly translate picture, to the first original image is converted to grayscale and then specify the binarization threshold value,
Import tesserocr
 from the PIL Import Image 
Image = Image.open ( ' Picture / 2.jpg ' ) 
Image = image.convert ( ' L ' ) 
threshold = 105   # The smaller the number, the less the pixels in the picture, the more the blank 
table = []
 for I in Range (256 ):
     IF I < threshold: 
        table.append (0) 
    the else : 
        table.append ( . 1 ) 
Image = image.point (Table, ' . 1 ' )
image.show()
result = tesserocr.image_to_text(image)

Guess you like

Origin www.cnblogs.com/zhangjian0092/p/11248712.html