Today, I will give you an analysis of the simple solution to the problem of verification code recognition, and start the lecture:
First of all, this is a string of verification code links:
https://credit.wsjd.gov.cn/portal/captcha
Then we parse this string of links, and then use pytesseract and PIL to identify the verification code. Not much nonsense, just start it directly, the code:
import pytesseract import urllib2 from PIL import Image import sys reload(sys) sys.setdefaultencoding('utf-8') headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Connection': 'keep-alive', 'Host': 'credit.wsjd.gov.cn', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36', } url = 'https://credit.wsjd.gov.cn/portal/captcha' request = urllib2.Request(url,headers=headers) res = urllib2.urlopen(request).read() try: captchaFile = 'yishi/static/images/credit_captcha.png'#This is to create a file to store the parsed verification code with open(captchaFile, 'wb') as f: f.write(res) #Identify
the verification code image = Image.open(captchaFile) captcha_value = pytesseract.image_to_string(image) print 'The verification code is: '+captcha_value except IOError,e: #Verification code failed to re-request print('Failed to get verification code') print (s)