The whole process of verification code recognition is actually combated, breaking through the most difficult AI verification code in history!

This article will introduce the history and development of verification codes, the history and development of verification code cracking, and the actual combat of the whole process of verification code cracking.

History and Development of Captcha

file

Captcha, the full name is "Completely Automated Public Turing test to tell Computers and Humans Apart", that is, a fully automatic Turing test that distinguishes between computers and humans, Captcha. As early as the 1990s, in order to prevent malicious network robot behaviors, such as email bombing and brute force password cracking, verification codes came into being.

The original verification code is a simple text character, such as users only need to enter a set of distorted letters and numbers. Captchas then progressed to image captchas, for example, asking users to identify which pictures contained a certain object (such as a cat, dog, or car, etc.). As technology developed, more complex captcha types emerged, such as logical captchas (for example, 3+4=?), audio captchas (users must listen to audio and then enter the characters they hear), and 3D captchas ( The user needs to interpret the 3D object or scene).

In addition, there are some new captcha designs. In order to improve user experience and maintain website security, they require users to perform more humanized operations. For example, a slide captcha allows users to swipe to unlock, a tap captcha allows users to tap a specific image or text, and a rotate captcha requires users to adjust the image to the correct orientation.

Some big companies have also developed their own captcha systems. For example, Google's reCAPTCHA v2 introduces complex image recognition tasks, requiring users to select pictures containing specific objects (such as cars, traffic lights); while Google's reCAPTCHA v3 abandons the way of user interaction, by analyzing user behavior patterns Determine whether it is a human or a machine. Similarly, third-party verification services such as GeeTest CAPTCHA and hCaptcha also provide verification services for websites, enabling them to better prevent automated malicious behavior.


History and Development of Captcha Cracking

The history of captcha cracking is closely connected with the development of captcha. Early verification code cracking mainly relied on OCR (Optical Character Recognition, Optical Character Recognition) technology, which is a technology that converts text in images into machine-readable characters for recognizing simple text verification codes.

However, as the complexity of the verification code increases, more complex techniques are required to crack the verification code. For example, for image captchas, it may be necessary to use image processing techniques to deal with noise and distortion. This may include steps such as grayscaling (converting the image to black and white), binarization (further reducing the image to only two colors, black and white), edge detection (identifying edges in the image).

More complex captchas, such as click captchas and spin captchas, may require the use of more sophisticated machine vision technology. This can involve feature extraction (identifying important features in images), object recognition (recognizing specific objects or shapes), or even deep learning (training models to recognize complex patterns).

In recent years, with the development of artificial intelligence, technologies such as machine learning and deep learning have also been applied to cracking verification codes. For example, convolutional neural networks (CNNs) have been used to recognize complex image captchas, while recurrent neural networks (RNNs) can be used to recognize audio captchas. These models can learn to recognize complex patterns of verification codes by training on a large amount of data, which greatly improves the accuracy and efficiency of verification code cracking.


Manual service for high-precision identification of verification codes in the new era

The artificial verification code recognition service is a verification code recognition solution based on artificial intelligence or artificial labor. This service can provide a relatively efficient and accurate solution when machines cannot recognize complex captchas.

file

2Captcha

2Captcha is a captcha recognition service based on human labor. It provides an API interface that allows developers to send unrecognized verification codes to 2Captcha services. 2Captcha's workers then manually identify and return the results. This service has a high accuracy rate for processing complex verification codes such as image verification codes, text verification codes, click class verification codes, GeeTest, reCAPTCHA, FunCaptcha, and provides interface documents in multiple programming languages ​​​​Python, PHP, Java, Go , Ruby, C++, C#. The main advantage of 2Captcha is its excellent accuracy and flexible API, which allows developers to easily integrate and use it in different environments.

cloud code

Cloud Code provides verification code recognition services based on image recognition technology and artificial assistance, and provides verification code recognition services for online ordinary pictures, sliding, clicking, Google, HCaptcha, and digital calculation questions. It has a better effect on image verification codes, especially various types of image verification codes. However, for complex verification codes, the accuracy rate will drop and the recognition time will be longer, and the follow-up of verification code types will be slower.

Bing Tuo

Bingtuo can recognize various common picture verification codes, AI recognition + real person recognition dual mode, and can efficiently recognize various pictures such as sitting titles, calculation questions, character questions, slider questions, and puzzle questions. The API supports Python, JAVA, PHP, and JAVASCRIPT calls, and supports button wizard integration. It has its own unique processing methods and provides customized services for various sliders, puzzles, rotations, and coordinates, and does not support Google verification codes.

super eagle

Super Eagle is a professional manual coding platform that accurately and quickly classifies image data and returns the classification results in real time. It supports various types of picture verification codes such as English numbers, Chinese characters, and coordinate selection calculations, and provides customized verification code recognition services. It has a good recognition effect on general-purpose verification codes and traditional verification codes, but it has not yet provided more services for complex verification codes.


Verification code cracking practice

Take 2Captcha cracking reCAPTCHA v2 as an example

1. Register 2Captcha, https://cn.2captcha.com/, support Alipay recharge

file

2. Target to crack https://www.scrapebay.com/spam website reCAPTCHA v2

file

3. Get 2Captcha API_KEY

file

4. Get google sitekey

file
file

5. Crack the captcha

Install 2captcha-python

pip3 install 2captcha-python

Crack Captcha

# 导入BeautifulSoup、TwoCaptcha、requests库
from bs4 import BeautifulSoup
from twocaptcha import TwoCaptcha
import requests

# TwoCaptcha服务的API秘钥,你需要使用自己的
API_KEY = 'xxxxxxxxxxxxxx'
# 利用TwoCaptcha库,使用提供的API秘钥初始化一个solver对象,该对象可以解决ReCAPTCHA问题
solver = TwoCaptcha(API_KEY)
# 要抓取的网页的URL
url = "https://www.scrapebay.com/spam"
# 这是ReCAPTCHA的site key,可以从网页源码中找到。
site_key='6LfGNEoeAAAAALUsU1OWRJnNsF1xUvoai0tV090n'

# 这个函数用来获取CSRF token和cookies。它首先通过requests.get()获取页面内容,然后通过BeautifulSoup找到CSRF token。最后返回CSRF token和cookies。
def get_csrf_cookie(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")
    csrf_el = soup.select_one('[name=csrfmiddlewaretoken]')
    csrf = csrf_el['value']
    cokkies = response.cookies
    return csrf, cokkies

# 这个函数用来解决ReCAPTCHA问题。它使用TwoCaptcha solver对象的recaptcha()方法,如果发生异常则打印错误并退出。
def solve(url,sitekey):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        print(e)
        exit()
    return result

# 首先通过get_csrf_cookie(url)获取CSRF token和cookies,然后通过solve(url,site_key)解决ReCAPTCHA问题,获得ReCAPTCHA的验证码结果
def main():
    csrf,cokkies = get_csrf_cookie(url)
    print("csrf:",csrf)
    print("cokkies:",cokkies)
    result = solve(url,site_key)
    print("captcha:",result)
   

if __name__ == "__main__":
    main()

operation result:
file

6. The page data after obtaining the verification code

The entire code including cracking the captcha is as follows:

# 导入BeautifulSoup、TwoCaptcha、requests库
from bs4 import BeautifulSoup
from twocaptcha import TwoCaptcha
import requests

# 2Captcha服务的API秘钥,你需要使用自己的
API_KEY = 'xxxxxxxxxxxxxx'
# 利用TwoCaptcha库,使用提供的API秘钥初始化一个solver对象,该对象可以解决ReCAPTCHA问题
solver = TwoCaptcha(API_KEY)
# 要抓取的网页的URL
url = "https://www.scrapebay.com/spam"
# 这是ReCAPTCHA的site key,可以从网页源码中找到。
site_key='6LfGNEoeAAAAALUsU1OWRJnNsF1xUvoai0tV090n'

# 这个函数用来获取CSRF token和cookies。它首先通过requests.get()获取页面内容,然后通过BeautifulSoup找到CSRF token。最后返回CSRF token和cookies。
def get_csrf_cookie(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, "lxml")
    csrf_el = soup.select_one('[name=csrfmiddlewaretoken]')
    csrf = csrf_el['value']
    cokkies = response.cookies
    return csrf, cokkies

# 这个函数用来解决ReCAPTCHA问题。它使用TwoCaptcha solver对象的recaptcha()方法,如果发生异常则打印错误并退出。
def solve(url,sitekey):
    try:
        result = solver.recaptcha(sitekey=sitekey, url=url)
    except Exception as e:
        print(e)
        exit()
    return result

# 这个函数用来提交解决ReCAPTCHA后的页面。它首先构建一个POST请求的payload,然后通过requests.post()方法发送请求。最后返回网页的最后一列的文本。
def post_page(url, csrf, cookie, result):
    payload = 'csrfmiddlewaretoken={}&g-recaptcha-response={}'
    headers = {
    
    
        'Content-Type': 'application/x-www-form-urlencoded',
        'Referer': 'https://www.scrapebay.com/spam'
    }
    response = requests.post(url,data=payload.format(csrf,result),headers=headers,cookies=cookie)
    soup = BeautifulSoup(response.text, "lxml")
    el = soup.select_one('td:last-child')
    return el.get_text()

# 先通过get_csrf_cookie(url)获取CSRF token和cookies,然后通过solve(url,site_key)解决ReCAPTCHA问题,最后通过post_page(url,csrf,cokkies,result)提交页面并打印出结果。
def main():
    csrf,cokkies = get_csrf_cookie(url)
    print("csrf:",csrf)
    print("cokkies:",cokkies)
    result = solve(url,site_key)
    print("captcha:",result)
    data = post_page(url,csrf,cokkies,result)
    print("result:",data)

if __name__ == "__main__":
    main()

The page after website verification:
file

operation result:
file


7. end

So far we have cracked reCAPTCHA v2 using the 2Captcha service and obtained the content to be crawled. 2Captcha service includes a variety of verification code formats, all of which can use the above process to modify the details of different verification codes to overcome the difficulty of identifying verification codes.

Guess you like

Origin blog.csdn.net/magicyangjay111/article/details/131964309