今日就给大家分析一下简单的解决验证码识别的问题,开讲:
首先这是一串验证码链接:
https://credit.wsjd.gov.cn/portal/captcha
然后我们去解析这串链接,再利用pytesseract,PIL这两个库对验证码进行识别,废话不多说,直接开干,代码:
import pytesseract import urllib2 from PIL import Image import sys reload(sys) sys.setdefaultencoding('utf-8') headers = { 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, br', 'Connection': 'keep-alive', 'Host': 'credit.wsjd.gov.cn', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36', } url = 'https://credit.wsjd.gov.cn/portal/captcha' request = urllib2.Request(url,headers=headers) res = urllib2.urlopen(request).read() try: captchaFile = 'yishi/static/images/credit_captcha.png'#这个是创建一个文件来存放解析出来的验证码 with open(captchaFile, 'wb') as f: f.write(res)
#对验证码进行识别 image = Image.open(captchaFile) captcha_value = pytesseract.image_to_string(image) print '验证码为:'+captcha_value except IOError,e: #验证码失败 重新请求 print('验证码获取失败') print(e)