[Python automation] selenium verification code recognition

This column will be purpose-oriented, with the goal of simplifying or automating the completion of work tasks, using Python in practice to solve practical problems, so as to stimulate readers' interest in learning this scripting language. Before starting the actual learning of Python automation, it is recommended to learn and understand the relevant knowledge of the Python language itself and Python crawlers . A related column has been set up for this blog, which can be accessed directly.


Summary of past content:


"Recognition of verification codes" is an unavoidable topic whether it is in the field of web crawlers or automation. Previously, the author gave a preliminary introduction to the recognition of verification codes in the article [Python Crawler] 9. Tesseract for Machine Vision and Machine Image Recognition . In the article "Educational Affairs Management System: Grades, Class Schedule Query Interface Design and Realization of Class Grabbing and Monitoring Functions ", the recognition of graphic verification codes is demonstrated in practice. In this article, a relatively systematic overview and summary will be made on the identification of verification codes.

Generally speaking, in automation work, if we encounter verification code problems, we generally solve them in three ways. First, manual solution; second, coding platform solution; third, code solution.

For the first method, as in the introductory article [Python Automation] Selenium's online learning automation article, in the login link, in order to save learning costs, we use manual input to process the verification code and use image recognition.

insert image description here

def Login():
        print('***************正在加载必要元素,请耐心等待***************')
        browser.get('https://www.******.cn/home/')
        browser.find_element_by_xpath('//*[@id="accountFrom"]/label[1]/input').send_keys(username)
        browser.find_element_by_xpath('//*[@id="accountFrom"]/label[3]/input').send_keys(pwd)
        input("请输入验证码登陆后回车确认:")

For the second method, as the so-called use of "banknote capacity" to solve problems, only enough budget is needed, which is simple and efficient. I won't repeat it here. This article will mainly describe how to realize the identification of verification codes through codes. In view of the diversity of current verification code forms, the two most common verification methods, character recognition and sliding puzzle, are selected here to introduce them.


The basic steps of verification code identification:
  1. Locating identification elements
  2. Get the recognized (full) picture
  3. Write the recognition method
  4. Verified by identification

1. Character verification code recognition

insert image description here

  1. Get and save verification code:

The basic logic of this step is: visit the webpage and save the verification code. But it is worth noting that the value of the verification code will also change every time the page is refreshed, so the cookie value when requesting the verification code should be consistent with the cookie value when visiting the web page.

(1) Visit the webpage
url1 = '手动打码/Login.aspx'
def get_cookie():
        headers1 = {
    
    
            "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
            "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
            "Accept-Encoding": "gzip, deflate",
            "Content-Type": "application/x-www-form-urlencoded",
            "Origin": "手动打码",
            "Connection": "keep-alive",
            "Referer": "手动打码",
            "Upgrade-Insecure-Requests": "1"
        }
        main = session.get(url1, headers=headers1)
        gb_headers = main.headers
        return gb_headers
(2) Store the verification code
test = get_cookie()
url2 = '手动打码/Image.aspx'
def get_pic():
    # 验证码请求头
    headers2 = {
    
    
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0",
        "cookie": "varPartNewsManage.aspx=10" + test["Set-Cookie"]
    }

    re_pic = requests.get(url2, headers=headers2)
    response = re_pic.content

    file = "C:\\Users\\john\\Desktop\\1\\" + ".png"
    playFile = open(file, 'wb')
    playFile.write(response)
    playFile.close()
  1. Identify verification code:
def recognize_captcha(img_path):
    im = Image.open(img_path)
    num = pytesseract.image_to_string(im)
    return num

get_pic()
pic_res = recognize_captcha("C:\\Users\\john\\Desktop\\1\\" + ".png")
#print(pic_res)  # 验证码识别结果
  1. log in
def post_login():
    headers3 = {
    
    
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
        "Accept-Language": "zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2",
        "Accept-Encoding": "gzip, deflate",
        "Content-Type": "application/x-www-form-urlencoded",
        "Origin": "手动打码",
        "Connection": "keep-alive",
        "Referer": "手动打码",
        "Upgrade-Insecure-Requests": "1",
        "cookie": "varPartNewsManage.aspx=10;" + test["Set-Cookie"]
    }
    data = {
    
     "Flag": "Login",
            "username": "手动打码",
            "password": "手动打码",
            "ddlUserClass": "1",
            "code1": pic_res,
            "ImageButton2.x": "64",
            "ImageButton2.y": "10"}
    res = session.post(url=url,data=data,headers=headers3)
    #print(res.request.headers)  #核验cookie是否有效带上
    #print(res.text)

post_login()

2. Sliding Puzzle Verification Code Recognition

insert image description here

  1. Get the location of the verification button and verification code
def get_geetest_button(self):
    """
    获取初始验证按钮
    :return:
    """
    # 验证按钮
    button = self.wait.until(EC.element_to_be_clickable((By.CLASS_NAME,'geetest_radar_tip')))
    return button

def get_position(self):
    """
    获取验证码位置
    :return: 验证码位置元组
    """
    img = self.wait.until(EC.presence_of_element_located((By.CLASS_NAME, 'geetest_canvas_img')))
    print('img')
    location = img.location
    size = img.size
    top, bottom, left, right = location['y'], location['y'] + size['height'], location['x'], location['x'] + size['width']

Enter the verification page to obtain the location of the verification button and verification code

  1. get full picture
    insert image description here
    insert image description here
def get_screenshot(self):
    """
    获取网页截图
    :return: 截图对象
    """
    screenshot = self.browser.get_screenshot_as_png()
    screenshot = Image.open(BytesIO(screenshot))
    return screenshot
def get_geetest_image(self, name='captcha.png'):
    """
    获取验证码图片
    :return: 图片对象
    """
    top, bottom, left, right = self.get_position()
    print('验证码位置', top, bottom, left, right)
    screenshot = self.get_screenshot()
    captcha = screenshot.crop((left, top, right, bottom))
    captcha.save(name)
    return captcha
def delete_style(self):
    '''
    执行js脚本,获取无滑块图
    :return None
    '''
    js = 'document.querySelectorAll("canvas")[2].style=""'
    self.browser.execute_script(js)
def get_gap(self, image1, image2):
    """
    获取缺口偏移量
    :param image1: 带缺口图片
    :param image2: 不带缺口图片
    :return:
    """
    left = 60
    print(image1.size[0])
    print(image1.size[1])
    for i in range(left, image1.size[0]):
        for j in range(image1.size[1]):
            if not self.is_pixel_equal(image1, image2, i, j):
                left = i
                return left
    return left

def is_pixel_equal(self, image1, image2, x, y):
    """
    判断两个像素是否相同
    :param image1: 图片1
    :param image2: 图片2
    :param x: 位置x
    :param y: 位置y
    :return: 像素是否相同
    """
    # 取两个图片的像素点
    pixel1 = image1.load()[x, y]
    pixel2 = image2.load()[x, y]
    threshold = 60
    if abs(pixel1[0] - pixel2[0]) < threshold and abs(pixel1[1] - pixel2[1]) < threshold and abs(
            pixel1[2] - pixel2[2]) < threshold:
        return True
    else:
        return False

Get a complete picture by adjusting the css style. At the same time, get the picture with the gap and add it to the picture with the gap for comparison, and get the offset! (PIL)

  1. verify

Get the movement track according to the offset, control the slider, and fill the gap!

def get_track(self, distance):
    """
    根据偏移量获取移动轨迹
    :param distance: 偏移量
    :return: 移动轨迹
    """
    # 移动轨迹
    track = []
    # 当前位移
    current = 0
    # 减速阈值
    mid = distance * 4 / 5
    # 计算间隔
    t = 0.2
    # 初速度
    v = 0
    while current < distance:
        if current < mid:
            # 加速度为正2
            a = 2
        else:
            # 加速度为负3
            a = -1
        # 初速度v0
        v0 = v
        # 当前速度v = v0 + at
        v = v0 + a * t
        # 移动距离x = v0t + 1/2 * a * t^2
        move = v0 * t + 1 / 2 * a * t * t
        # 当前位移
        current += move
        # 加入轨迹
        track.append(round(move))
    return track

def move_to_gap(self, slider, track):
    """
    拖动滑块到缺口处
    :param slider: 滑块
    :param track: 轨迹
    :return:
    """
    ActionChains(self.browser).click_and_hold(slider).perform()
    for x in track:
        ActionChains(self.browser).move_by_offset(xoffset=x, yoffset=0).perform()
    time.sleep(0.5)
    ActionChains(self.browser).release().perform()

It is worth noting here that due to the particularity of the verification code, we cannot control the slider to slide over at a constant speed (humans cannot do it), and the extreme experience has also been verified here! So we operate according to people, first accelerate and then decelerate to optimize!

In addition, the zoom of both the computer settings and the browser settings must be changed to 100%. Otherwise, it will affect the interception of the picture and the wrong calculation of the offset.


So far, this article has come to an end. The writing of this article comes from a little experience in development. The main purpose is to improve readers' interest in learning Python and solve practical problems through practice. For readers who are interested in this field as a reference. I hope this article can play a role in attracting jade, and welcome everyone's criticism and exchange.


If you have any questions or good suggestions, look forward to your message, comments and attention!

Guess you like

Origin blog.csdn.net/deng_xj/article/details/119912564