Lecture 23: Use resources to learn how to use a coding platform to process verification codes

In the previous lesson, we introduced a variety of verification codes, including graphic text, simulated clicking, and dragging and sliding, but in the final analysis, people need to make some judgments on a certain situation, and then return the result. And submit. If the result of the verification code submitted at this time is correct and some verification code tests are passed, the verification code can be successfully broken.

So, since the verification code is for people to recognize, what about the machine? If we don’t know any algorithm, how can we solve these verification codes? At this point, if there is a tool or platform that helps us to identify the verification code, it would be great, let the tool or platform return the result of the verification code identification to us, and we submit the result with the result.

Is there such a tool or platform? There is really a special coding platform to help us identify various verification codes. The algorithm and manpower are integrated within the platform, and various verification codes can be recognized 7x24 hours, including identification graphics, coordinate points, gaps, etc. The verification code, returning the corresponding result or coordinates, can just solve our problem.

In this lesson, we will introduce the process of using the coding platform to identify the verification code.

learning target

In this lesson, we will use a verification code as an example to explain the use of the coding platform. The link for the verification code is: https://captcha3.scrape.cuiqingcai.com/ , this website will pop up every time you log in A verification code, its verification code effect diagram is shown below.
Insert picture description here
Several Chinese characters are displayed on the verification code, and several Chinese characters are also displayed in the figure. We need to click on the positions of the Chinese characters in the figure in order, and click Finish to confirm the submission to complete the verification.

This kind of verification code is difficult to identify if we don't have any image recognition algorithm basis, so here we can use the coding platform to help us identify the location of Chinese characters.

Ready to work

The Python library we use is Selenium, and the browser we use is Chrome.

Before starting this lesson, please make sure that the Selenium library, Chrome browser, and ChromeDriver have been properly installed. For the related process, please refer to the introduction of Selenium in that lesson.

In addition, the coding platform used in this lesson is Super Eagle, the link is: https://www.chaojiying.com/ , before using it, please register your own account and get some test points for testing, and you can also understand what the platform can recognize Type of verification code.

Coding platform

The types of services that the coding platform can provide are generally very wide, and there are many types of identifiable verification codes, including tap verification codes.

The Super Eagle platform also supports simple graphic verification code recognition. The Super Eagle platform provides the following services.
The types of services that the coding platform can provide are generally very wide, and there are many types of identifiable verification codes, including tap verification codes.

The Super Eagle platform also supports simple graphic verification code recognition. The Super Eagle platform provides the following services.

  • English numbers: provide mixed identification of up to 20 English numbers;
  • Chinese characters: Provide recognition of up to 7 Chinese characters;
  • Pure English: Provide English identification of up to 12 digits;
  • Pure numbers: provide up to 11 digits for identification;
  • Arbitrary special characters: provide the recognition of characters such as variable length Chinese characters, English numbers, pinyin initials, calculation questions, idiom mixes, container numbers and other characters;
  • Coordinate selection recognition: such as complex calculation questions, multiple choice questions, choose one of four, essay questions, click the same word, object, animal, etc. to return multiple coordinate recognition.

For specific changes, please refer to the official website: https://www.chaojiying.com/price.html .

What needs to be dealt with here is the multi-selection recognition of coordinates. We first submit the verification code image to the platform, the platform will return the coordinate position of the recognition result in the image, and then we will analyze the coordinate to simulate the click.

Below we will use the program to achieve.

Get API

Download the corresponding Python API on the official website, the link is: https://www.chaojiying.com/api-14.html. The API is of the Python 2 version and is implemented using the requests library. We can simply change a few places to modify it to the Python 3 version.

The revised API is as follows:

import requests
from hashlib import md5
class Chaojiying(object):

   def __init__(self, username, password, soft_id):
       self.username = username
       self.password = md5(password.encode('utf-8')).hexdigest()
       self.soft_id = soft_id
       self.base_params = {
    
    
           'user': self.username,
           'pass2': self.password,
           'softid': self.soft_id,
       }
       self.headers = {
    
    
           'Connection': 'Keep-Alive',
           'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
       }

   def post_pic(self, im, codetype):
       """
       im: 图片字节
       codetype: 题目类型 参考 http://www.chaojiying.com/price.html
       """
       params = {
    
    
           'codetype': codetype,
       }
       params.update(self.base_params)
       files = {
    
    'userfile': ('ccc.jpg', im)}
       r = requests.post('http://upload.chaojiying.net/Upload/Processing.php', data=params, files=files,
                         headers=self.headers)
       return r.json()

   def report_error(self, im_id):
       """
       im_id:报错题目的图片ID
       """
       params = {
    
    
           'id': im_id,
       }
       params.update(self.base_params)
       r = requests.post('http://upload.chaojiying.net/Upload/ReportError.php', data=params, headers=self.headers)
       return r.json()

A Chaojiying class is defined here. Its constructor receives three parameters, namely the username, password and software ID of Super Eagle, which are saved for use.

The most important method is called post_pic, which needs to pass in the code name of the picture object and the verification code type. This method will send the picture object and related information to Super Eagle's backend for identification, and then return the successfully identified JSON.

Another method is called report_error, which is a callback when an error occurs. If the verification code is incorrectly identified, calling this method will return the corresponding score.

Next, we take https://captcha3.scrape.cuiqingcai.com/ as an example to demonstrate the identification process.

initialization

First, we introduce some necessary packages, and then initialize some variables, such as WebDriver, Chaojiying objects, etc. The code implementation is as follows:

import time
from io import BytesIO
from PIL import Image
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from chaojiying import Chaojiying
USERNAME = 'admin'
PASSWORD = 'admin'
CHAOJIYING_USERNAME = ''
CHAOJIYING_PASSWORD = ''
CHAOJIYING_SOFT_ID = 893590
CHAOJIYING_KIND = 9102
if not CHAOJIYING_USERNAME or not CHAOJIYING_PASSWORD:
   print('请设置用户名和密码')
   exit(0)
class CrackCaptcha():
   def __init__(self):
       self.url = 'https://captcha3.scrape.cuiqingcai.com/'
       self.browser = webdriver.Chrome()
       self.wait = WebDriverWait(self.browser, 20)
       self.username = USERNAME
       self.password = PASSWORD
       self.chaojiying = Chaojiying(CHAOJIYING_USERNAME, CHAOJIYING_PASSWORD, CHAOJIYING_SOFT_ID)

The USERNAME and PASSWORD here are the username and password of the sample website, and they can be set to admin. In addition, CHAOJIYING_USERNAME and CHAOJIYING_PASSWORD are the username and password of the Super Eagle coding platform, which can be set to your own.

In addition, a CrackCaptcha class is defined here, which initializes the browser object and the operation object of the coding platform.

Next, we use Selenium to simulate calling out the verification code to start verification.

get verification code

The next step is to complete the relevant form and simulate the click-to-call verification code. The code implementation is as follows:

def open(self):
   """
   打开网页输入用户名密码
   :return: None
   """
   self.browser.get(self.url)
   # 填入用户名密码
   username = self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'input[type="text"]')))
   password = self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'input[type="password"]')))
   username.send_keys(self.username)
   password.send_keys(self.password)
def get_captcha_button(self):
   """
   获取初始验证按钮
   :return:
   """
   button = self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'button[type="button"]')))
   return button

Here we call the open method to fill in the form. The get_captcha_button method obtains the captcha button, and then triggers a click. At this time, you can see that the page has presented the captcha.

With the picture of the verification code, the next step we need to do is to obtain the specific content of the verification code and send it to the coding platform for identification.

How to get the picture of the verification code? We can first obtain the location and size of the verification code picture, and then take the corresponding verification code picture from the screenshot of the webpage. The code implementation is as follows:

def get_captcha_element(self):
   """
   获取验证图片对象
   :return: 图片对象
   """
   # 验证码图片加载出来
   self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'img.geetest_item_img')))
   # 验证码完整节点
   element = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'geetest_panel_box')))
   print('成功获取验证码节点')
   return element
def get_captcha_position(self):
   """
   获取验证码位置
   :return: 验证码位置元组
   """
   element = self.get_captcha_element()
   time.sleep(2)
   location = element.location
   size = element.size
   top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size[
       'width']
   return (top, bottom, left, right)
def get_screenshot(self):
   """
   获取网页截图
   :return: 截图对象
   """
   screenshot = self.browser.get_screenshot_as_png()
   screenshot = Image.open(BytesIO(screenshot))
   screenshot.save('screenshot.png')
   return screenshot
def get_captcha_image(self, name='captcha.png'):
   """
   获取验证码图片
   :return: 图片对象
   """
   top, bottom, left, right = self.get_captcha_position()
   print('验证码位置', top, bottom, left, right)
   screenshot = self.get_screenshot()
   captcha = screenshot.crop((left, top, right, bottom))
   captcha.save(name)
   return captcha

Here, the get_captcha_image method is to intercept the corresponding captcha image from the screenshot of the webpage, and the relative position coordinates of the captcha image are returned by the get_captcha_position method. Therefore, the verification code was obtained by taking a screenshot and then cutting.

Note: If your screen is a high-definition screen such as a Mac's Retina screen, you may need to adjust the screen resolution appropriately or do some multiple offset calculations for the obtained verification code position.

Finally, the verification code we get is the Image object, and the result is shown in the figure.
Insert picture description here

Identification verification code

Now that we have the verification code map, the next step is to send the map to the coding platform.

We call the post_pic method of the Chaojiying object to send the image to the Super Eagle backend. The image sent here is in byte stream format. The code implementation is as follows:

image = self.get_touclick_image()
bytes_array = BytesIO()
image.save(bytes_array, format='PNG')
# 识别验证码
result = self.chaojiying.post_pic(bytes_array.getvalue(), CHAOJIYING_KIND)
print(result)

After running, the result variable is the recognition result of Super Eagle background. It may take a few seconds to run, and it will return a string in JSON format.

If the recognition is successful, the typical return result is as follows:

{
    
    'err_no': 0, 'err_str': 'OK', 'pic_id': '6002001380949200001', 'pic_str': '132,127|56,77', 'md5': '1f8e1d4bef8b11484cb1f1f34299865b'}

Among them, pic_str is the coordinate of the recognized text, which is returned as a string, and each coordinate is separated by |. Next we only need to parse it, and then simulate the click, the code implementation is as follows:

def get_points(self, captcha_result):
   """
   解析识别结果
   :param captcha_result: 识别结果
   :return: 转化后的结果
   """
   groups = captcha_result.get('pic_str').split('|')
   locations = [[int(number) for number in group.split(',')] for group in groups]
   return locations
def touch_click_words(self, locations):
   """
   点击验证图片
   :param locations: 点击位置
   :return: None
   """
   for location in locations:
       ActionChains(self.browser).move_to_element_with_offset(self.get_captcha_element(), location[0], location[1]).click().perform()
       time.sleep(1)

Here, the get_points method is used to turn the recognition result into a list. The touch_click_words method passes the parsed coordinates in turn by calling the move_to_element_with_offset method, and then click.

In this way, we have simulated the click of the coordinates, and the running effect is shown below.
Insert picture description here
Finally, simulate clicking the submit verification button, and after the verification is passed, it will automatically log in. The subsequent implementation will not be repeated here.

How to judge whether the login is successful? You can also use Selenium's judgment conditions. For example, if a certain text appears in the judgment page, it means login is successful. The code is as follows:

# 判定是否成功
success = self.wait.until(EC.text_to_be_present_in_element((By.TAG_NAME, 'h2'), '登录成功'))

For example, here we have determined whether the page will jump to the page prompting success by clicking the confirm button. The successful page contains an h2 node with the words "login successful", which means login is successful.

In this way, we have completed the identification of the touch verification code with the help of the online verification code platform. This method is a general method, and we can also use this method to identify various verification codes such as graphics, numbers, and arithmetic.

Conclusion

In this lesson, we assisted in completing the verification code identification through the online coding platform. This identification method is very powerful, and almost any verification code can be identified. If you encounter a problem, using a coding platform is undoubtedly an excellent choice.

Guess you like

Origin blog.csdn.net/weixin_38819889/article/details/107907234