不会深度学习的你如何实现验证码的高质量识别

前言:
在很多爬虫的自动化项目中,我们难免会遇到一些需要处理验证码的问题,其中一种解决方案就是通过手动输入,但这样效率却很慢,有没有什么方法能帮助我们自动识别呢?当然又,那就是打码平台,这里我推荐打码狗平台,识别率很高,并且便宜,但是还是希望大家也包括我后来能通过深度学习的方式,建立起自己的一套识别体系,废话不多说,那么就开始吧!!!

不想看分部分介绍直接拉到最下面修改配置即可食用

简单介绍打码狗平台(没有广告钱)

可以看一下下面这个介绍,然后大家注册一下,充值积分,特别便宜我记得我那个1元钱都用了好几个月都还没用完
在这里插入图片描述

Python代码实现

这里我创建了一个captchaRecognize类,我将分别对类中每个部分进行讲解,

初始化

因为,打码平台没有反爬机制,所以简单下了一个User-Agent上去,加了一个判断条件,判断是否有有效的session传入

    def __init__(self, s):
        if s is None:
            self.s = requests.session()
        else:
            self.s = s
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400',
        }

获取打码平台UserKey

这个UserKey唯一标志了你的身份

# 用于获取打码平台UserKey
    def get_userKey(self, ):

        get_url = f"http://www.damagou.top/apiv1/login.html?username={username}&password={password}"
        try:
            r = requests.get(get_url, headers=self.headers)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            print("Dmagou Being Processing")
            return r.text
        except:
            print("Can't Get Userkey ", r.status_code)

获取验证码

    def get_captcha_pic(self):

        get_url = url_captcha
        headers_for_captcha = headers
        try:
            r = self.s.get(get_url, headers=headers_for_captcha)
            return r.content
        except:
            pass

识别验证码

下面的type参数对应于打码狗可以自己选择不同验证码类型

    def get_english_captcha(self, captcha, userkey):
        base64_data = base64.b64encode(captcha)

        postUrl = 'http://www.damagou.top/apiv1/recognize.html'
        postData = {
            "image": base64_data,
            "userkey": userkey,
            "type": "1001",
        }
        try:
            r = requests.post(postUrl, data=postData, headers=self.headers)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            # print("破解验证码成功")
            return r.text
        except:
            pass
            # print("破解验证码失败")

全部代码

给出全部的代码

import random
import base64
import requests

url_captcha = "这里是获取验证码的地址"
username = '打码平台账户'
password = '打码平台密码'
# 下面这个是获取验证码的地址的headers大家都懂爬虫应该都知道
headers = {
    'Host': 'xxxx自己填写',
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400'

}


class captchaRecognize:
    def __init__(self, s):
        if s is None:
            self.s = requests.session()
        else:
            self.s = s
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.25 Safari/537.36 Core/1.70.3754.400 QQBrowser/10.5.4034.400',
        }

    def get_captcha_pic(self):

        get_url = url_captcha
        headers_for_captcha = headers
        try:
            r = self.s.get(get_url, headers=headers_for_captcha)
            return r.content
        except:
            pass

    # 用于获取打码平台UserKey
    def get_userKey(self, ):

        get_url = f"http://www.damagou.top/apiv1/login.html?username={username}&password={password}"
        try:
            r = requests.get(get_url, headers=self.headers)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            print("Dmagou Being Processing")
            return r.text
        except:
            print("Can't Get Userkey ", r.status_code)

    def get_english_captcha(self, captcha, userkey):
        base64_data = base64.b64encode(captcha)

        postUrl = 'http://www.damagou.top/apiv1/recognize.html'
        postData = {
            "image": base64_data,
            "userkey": userkey,
            "type": "1001",
        }
        try:
            r = requests.post(postUrl, data=postData, headers=self.headers)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            print("破解验证码成功")
            return r.text
        except:
            print("破解验证码失败")

    def __call__(self):
        captcha = self.get_captcha_pic()
        userKey = self.get_userKey()
        return self.get_english_captcha(captcha, userKey)


if __name__ == '__main__':
    session = requests.session()
    captchaRecognize = captchaRecognize(session)
    print(captchaRecognize())

猜你喜欢

转载自blog.csdn.net/solitudi/article/details/106976483