滑动验证码验证

selenium +chrome+ firefox + webdriver 遇到的坑

lunix中启动webdriver时报错一:

测试代码为:

  1.  
    #!/usr/bin/python
  2.  
    # -*- coding: utf-8 -*-
  3.  
     
  4.  
     
  5.  
    from selenium import webdriver
  6.  
     
  7.  
    driver = webdriver.Firefox()
  8.  
    driver.get( "https://www.baidu.com")

运行报错信息如下:

  1.  
    Traceback (most recent call last):
  2.  
    File "maimai_web.py", line 14, in <module>
  3.  
    driver = webdriver.Firefox()
  4.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
  5.  
    keep_alive= True)
  6.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
  7.  
    self.start_session(desired_capabilities, browser_profile)
  8.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
  9.  
    response = self.execute(Command.NEW_SESSION, parameters)
  10.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
  11.  
    self.error_handler.check_response(response)
  12.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
  13.  
    raise exception_class(message, screen, stacktrace)
  14.  
    selenium.common.exceptions.WebDriverException: Message: Process unexpectedly closed with status 1

处理方法:

  1.  
    #!/usr/bin/python
  2.  
    # -*- coding: utf-8 -*-
  3.  
     
  4.  
     
  5.  
    from pyvirtualdisplay import Display
  6.  
    from selenium import webdriver
  7.  
     
  8.  
     
  9.  
    display = Display(visible= 0, size=(1920, 1080))
  10.  
    display.start()
  11.  
    driver = webdriver.Firefox()
  12.  
    driver.get( "https://www.baidu.com")

结果:

运行ok,搞定!

坑二、webdriver实例化报错

采用多线程调用webdriver时候,偶尔会出现这样的错:selenium.common.exceptions.WebDriverException: Message: connection refused

  1.  
    Exception in thread Thread-2:
  2.  
    Traceback (most recent call last):
  3.  
    File "/usr/local/python3.6/lib/python3.6/threading.py", line 916, in _bootstrap_inner
  4.  
    self.run()
  5.  
    File "/usr/local/python3.6/lib/python3.6/threading.py", line 864, in run
  6.  
    self._target(*self._args, **self._kwargs)
  7.  
    File "maimai_tran_account_driver.py", line 591, in debug
  8.  
    t = TrainAccount(count,lock)
  9.  
    File "maimai_tran_account_driver.py", line 32, in __init__
  10.  
    self.chrome = webdriver.Firefox()
  11.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/firefox/webdriver.py", line 152, in __init__
  12.  
    keep_alive= True)
  13.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 98, in __init__
  14.  
    self.start_session(desired_capabilities, browser_profile)
  15.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 188, in start_session
  16.  
    response = self.execute(Command.NEW_SESSION, parameters)
  17.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 256, in execute
  18.  
    self.error_handler.check_response(response)
  19.  
    File "/usr/local/python3.6/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 194, in check_response
  20.  
    raise exception_class(message, screen, stacktrace)
  21.  
    selenium.common.exceptions.WebDriverException: Message: connection refused

 查看geckodriver.log具体报错信息。

坑三、模拟器被反爬

原因是在webdriver发送请求的时候,会有webdriver的js判断,当检测到此字段时会被作为爬虫处理,应对策略如下。

工具:mitmproxy做代理,替换掉请求里面的webdriver为别的字段

部分代码如下:

  1.  
    if "/_next/static/js/common_pdd" in flow.request.url:
  2.  
    flow.response.text = flow.response.text.replace( "webdriver", "userAgent")

坑四、滑动验证码验证失败

同样的代码,chromedriver验证码通过,firefox滑动到正常位置报失败,最后发现原因是firefox在滑动模块的时候速度太慢被机器识别出来,解决方法,增大滑动的速度,附上滑动验证的部分代码,如下:

  1.  
    def crack_geetest(self, max_retry=10):
  2.  
    driver = self.driver
  3.  
    l = self.logger
  4.  
    l.info( "process handle geetest captcha...")
  5.  
     
  6.  
    def get_position():
  7.  
    """
  8.  
    获取验证码位置
  9.  
    :return: 验证码位置元组
  10.  
    """
  11.  
    img = driver.find_element_by_xpath( '//div[@class="geetest_canvas_img geetest_absolute"]')
  12.  
    time.sleep( 2)
  13.  
    location = img.location
  14.  
    size = img.size
  15.  
    top, bottom, left, right = location[ 'y'], location['y'] + size['height'], location['x'], location['x'] + \
  16.  
    size[ 'width']
  17.  
    return (top, bottom, left, right)
  18.  
     
  19.  
    def get_geetest_image(name):
  20.  
    """
  21.  
    获取验证码图片
  22.  
    :return: 图片对象
  23.  
    """
  24.  
    full_img_path = './zhilian_screenshot_{}.png'.format(self.account['user_id'])
  25.  
    driver.save_screenshot(filename=full_img_path)
  26.  
    image = Image.open(fp=full_img_path, mode= 'r')
  27.  
    top, bottom, left, right = get_position()
  28.  
    print( '验证码位置:({},{},{},{})'.format(left, top, right, bottom))
  29.  
    t = driver.execute_script( 'var q=document.documentElement.scrollTop; return q;')
  30.  
    print( '验证码位置:({},{},{},{})'.format(left, top - int(t), right, bottom - int(t)))
  31.  
    print( 'p--->>>', t)
  32.  
    captcha = image.crop((left, top - int(t), right, bottom - int(t)))
  33.  
    captcha_file_name = './zhilian_captcha_{}_{}.png'.format(self.account['user_id'], name)
  34.  
    captcha.save(captcha_file_name)
  35.  
    return captcha, captcha_file_name
  36.  
     
  37.  
    def get_slider():
  38.  
    """
  39.  
    获取滑块
  40.  
    :return: 滑块对象
  41.  
    """
  42.  
    slider = driver.find_element_by_xpath( '//div[@class="geetest_slider_button"]')
  43.  
    return slider
  44.  
     
  45.  
    def get_gap(captcha_file_name):
  46.  
    """
  47.  
    获取缺口偏移量
  48.  
    :param image1: 不带缺口图片
  49.  
    :param image2: 带缺口图片
  50.  
    :return:
  51.  
    """
  52.  
    res = self.dama2.decode_captcha( 6137, captcha_file_name)
  53.  
    print(res)
  54.  
    # ('b800b4f6-0d9a-40e2-a972-d87c91582b46', [(176, 101)])
  55.  
    return int(res[1][0][0])
  56.  
     
  57.  
    def calculate_tracks(distance):
  58.  
    def generate_rand(n, sum_v): # 随机生成n个总和为sum_v的list
  59.  
    Vector = [random.randint( 1, 3) for _ in range(n)]
  60.  
    Vector = [int(i / sum(Vector) * sum_v) for i in Vector]
  61.  
    if sum(Vector) < sum_v:
  62.  
    res = sum_v - sum(Vector)
  63.  
    for i in range(res):
  64.  
    Vector[random.randint( 0, n - 1)] += 1
  65.  
    return [0 - i for i in Vector]
  66.  
     
  67.  
    back_dis = random.randint( 16, 26)
  68.  
    distance += back_dis # 先滑过一点,最后再反着滑动回来
  69.  
    v = 0
  70.  
    t = 0.2
  71.  
    forward_tracks = []
  72.  
     
  73.  
    current = 0
  74.  
    mid = distance * 3 / 5
  75.  
    while current < distance:
  76.  
    if current < mid:
  77.  
    a = 2
  78.  
    else:
  79.  
    a = -3
  80.  
     
  81.  
    s = v * t + 0.5 * a * (t ** 2)
  82.  
    v = v + a * t
  83.  
    current += s
  84.  
    forward_tracks.append(round(s))
  85.  
     
  86.  
    # 反着滑动到准确位置
  87.  
    back_tracks = generate_rand( 15, back_dis) # 总共等于 back_dis
  88.  
    return {'forward_tracks': forward_tracks, 'back_tracks': back_tracks}
  89.  
     
  90.  
    def move_to_gap(slider, tracks):
  91.  
    """
  92.  
    拖动滑块到缺口处
  93.  
    :param slider: 滑块
  94.  
    :param track: 轨迹
  95.  
    :return:
  96.  
    """
  97.  
    ActionChains(driver).click_and_hold(slider).perform()
  98.  
     
  99.  
    # 往后移动
  100.  
    for i in tracks['forward_tracks']:
  101.  
    ActionChains(driver).move_by_offset(i, 0).perform()
  102.  
     
  103.  
    # 往回移动
  104.  
    time.sleep( 0.5)
  105.  
    for i in tracks['back_tracks']:
  106.  
    ActionChains(driver).move_by_offset(i, 0).perform()
  107.  
     
  108.  
    # 小范围震荡一下
  109.  
    # time.sleep(0.3)
  110.  
    random_sc = random.randint( 3, 8)
  111.  
    ActionChains(driver).move_by_offset( 0-random_sc, 0).perform()
  112.  
    time.sleep( 0.5)
  113.  
    ActionChains(driver).move_by_offset(random_sc, 0).perform()
  114.  
     
  115.  
    # 释放
  116.  
    time.sleep( 0.5)
  117.  
    ActionChains(driver).release().perform()
  118.  
     
  119.  
    def crack(retry=0):
  120.  
    # 输入用户名密码
  121.  
    # 点击验证按钮
  122.  
    # 获取验证码图片
  123.  
    print( 'get_geetest_image')
  124.  
    captcha_obj, captcha_file_name = get_geetest_image( '2')
  125.  
    gap = get_gap(captcha_file_name)
  126.  
    l.info( '缺口位置:{}'.format(gap))
  127.  
    print( '缺口位置:{}'.format(gap))
  128.  
    # 减去起始缺口位移
  129.  
    BORDER = 29
  130.  
    gap -= BORDER
  131.  
    # 获取移动轨迹
  132.  
    track = calculate_tracks(gap)
  133.  
    l.info( '滑动轨迹:{}'.format(track))
  134.  
    print( '滑动轨迹:{}'.format(track))
  135.  
    # # 拖动滑块
  136.  
    slider = get_slider()
  137.  
    move_to_gap(slider, track)
  138.  
    driver.save_screenshot( './zhilian_capresult_{}_{}.png'.format(self.account['user_id'], retry))
  139.  
    #
  140.  
    time.sleep( 3)
  141.  
    # #
  142.  
    result = driver.find_element_by_xpath( '//div[@class="geetest_result_title"]').get_attribute('textContent')
  143.  
    l.info(result)
  144.  
    print(result)
  145.  
    return result
  146.  
     
  147.  
    retry = 1
  148.  
    while True:
  149.  
    l.info( f'{retry}/{max_retry} crack geetest.')
  150.  
    if retry == max_retry:
  151.  
    l.info( "max retry reached, return False")
  152.  
    return False
  153.  
    success = crack(retry)
  154.  
    if '秒的速度超过' in success or 'passport.lagou.com/login/login' not in driver.current_url:
  155.  
    l.info( "crack succeeded!")
  156.  
    print( "crack succeeded!")
  157.  
    return True
  158.  
    elif '拖动滑块将悬浮图像正确拼合' in success:
  159.  
    retry += 1
  160.  
    l.info( "crack failed, retry:{}/{}".format(retry, max_retry))
  161.  
    driver.find_element_by_xpath( '//a[@class="geetest_refresh_1"]').click()
  162.  
    time.sleep( 5)
  163.  
    continue
  164.  
    else:
  165.  
    time.sleep( 5)
  166.  
    retry += 1
  167.  
    l.info( "crack failed, retry:{}/{}".format(retry, max_retry))
  168.  
    continue

来源:https://blog.csdn.net/wenq_yang/article/details/81258932

猜你喜欢

转载自www.cnblogs.com/alex-13/p/12019764.html