Python automatically fills in the questionnaire star

Use python to automatically fill in the questionnaire, through smart verification and sliding verification

1. Download browser driver

The automatic filling of questionnaires with python needs to rely on the browser driver. Google Chrome is used here, so you need to download chromedriver, and the downloaded version must be consistent with the browser version.

First open Google Chrome, click "Help" - "About Google Chrome", and check the browser version. As shown in the picture:

20230319172412

You can see the version of Google Chrome

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-aqQaduZ9-1679225429575)(null)]

After viewing the version, open the link: CNPM Binaries Mirror to download the Google Chrome driver of the corresponding version of the corresponding system. As shown in the picture:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-UHwdhigs-1679225429555)(null)]

Because I am using the ubuntu system here, I downloaded the linux version (the same applies to other os)

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-XeZNta5z-1679225429521)(null)]

After the download is complete, unzip
and then configure the execution permissions:

cd chromedriver_linux64
chmod +x chromedriver

Next, move it to usr/binthe directory:

sudo mv chromedriver /usr/bin/

Test
Next a test is performed.

#coding=utf-8
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_opt = Options()  # 创建参数设置对象.
chrome_opt.add_argument('--headless')  # 无界面化.
chrome_opt.add_argument('--disable-gpu')  # 配合上面的无界面化.
chrome_opt.add_argument('--window-size=1366,768')  # 设置窗口大小, 窗口大小会有影响.
chrome_opt.add_argument("--no-sandbox") #使用沙盒模式运行
# 创建Chrome对象并传入设置信息.
browser = webdriver.Chrome(chrome_options=chrome_opt)
url = "https://www.baidu.com/"
browser.get(url)
print(browser.page_source)
browser.quit()

If the html code of Baidu's homepage is entered in the interface, it means success.

2. Basic configuration of selenium

The essence of selenium is to completely simulate the operation of the browser by driving the browser, just like a real user is operating. The main functions of this tool include: Test compatibility with browsers-test your application to see if it can work well on different browsers and operating systems. Test System Functionality - Create regression tests to verify software functionality and user requirements.

The selenium library can be installed via terminal commands pip install selenium.

The following is the general selenium basic configuration

import random          # 用于产生随机数
import time            # 用于延时
from selenium.webdriver.common.by import By      #导入By包进行元素定位
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#实例化一个启动参数对象
chrome_options = Options()
 
#添加启动参数
chrome_options.add_argument(
    'user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"')  # 添加请求头
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
 
# 防止被识别
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])     #设置开发者模式启动
 
chrome_options.add_experimental_option('useAutomationExtension', False)    # 关闭selenium对chrome driver的自动控制
 
# chrome_options.maximize_window()      # 网页最大化
 
#chrome_options.add_argument('headless')    #设置浏览器以无界面方式运行

The above user-agentneeds to be modified according to your own operating system, here I am the linux system

This value can be obtained by

Open Google Chrome, open the Baidu webpage (other webpages are fine), then open the developer mode (press F12), click a link on the Baidu webpage, return to the Baidu page, select the developer mode, and slide to the Networkbottom You can see user-agentit below

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-qTSV0jhW-1679225434664)(null)]

3. Answer code

set driver

browser = webdriver.Chrome(options=chrome_options)     #设置驱动程序，启动浏览器  （实现以特定参数启动）
browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument',
                        {
    
    'source': 'Object.defineProperty(navigator, "webdriver", {get: () => undefined})'})       #用来执行Chrome开发这个工具命令

Get questionnaire content

browser.get('https://www.***.**/**/*****.aspx')        # 获取问卷信息(此处填问卷链接)

multiple choice

# 问题1的点击 （性别）
randomId = random.randint(1, 2)       # 随机点击第一个选项或第二个选项

#js实现方式
js = "document.getElementById(\"q1_" + str(randomId) + "\").checked = true"
browser.execute_script(js)         #使用js实现点击的效果（调用js方法，同时执行javascript脚本）
js = "document.getElementById(\"q1_" + str(randomId) + "\").click()"
browser.execute_script(js)         #使用js实现点击的效果（调用js方法，同时执行javascript脚本）

# 延时 太快会被检测是脚本
time.sleep(1)


# 问题2    （年龄）
randomId = random.randint(2, 4)   # 随机数，5个多选框 随机点击
# js实现方式
js = "document.getElementById(\"q2_" + str(randomId) + "\").checked = true"
browser.execute_script(js)
js = "document.getElementById(\"q2_" + str(randomId) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
browser.execute_script(js)
# 延时
time.sleep(0.1)

multiple choice

# 问题5    
randomId = random.randint(1, 3)       # 随机数选择（选多少个）

for i in range(1, randomId + 1):       # 循环 实现多选效果
    randomId1 = random.randint(1, 6)   #随机选择第1到第6个选项之一
    
    # 两种js实现方式
    js = "document.getElementById(\"q5_" + str(randomId1) + "\").checked = true"
    browser.execute_script(js)
    js = "document.getElementById(\"q5_" + str(randomId1) + "\").click()"
    browser.execute_script(js)
    
# 延时
time.sleep(1)

fill in the blank

 
# 问题25

#自定义要填的内容
block = ["定义第1个填空","定义第2个填空","定义第3个填空","定义第4个填空","定义第5个填空","定义第6个填空","无"]  

#在上述内容中随机选择一个填入
randomId = random.randint(0, 5)          #（数值下标从0开始）

#在题目中随机输入上述内容
browser.find_element_by_id("q25").send_keys(block[randomId])    

# 延时
time.sleep(0.1)

4. Submit + smart verification + slider verification

submit

First check the source code of the questionnaire, find the position of the submit button in the html code, and copy the xpath of the "submit" button, as shown in the figure:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-UZHTCgEw-1679225429594)(null)]

#点击提交
submit = browser.find_element_by_xpath("//*[@id='ctlNext']")    #网页源代码的xpath
submit.click()      #点击

#延时 太快会被检测是脚本
time.sleep(0.5)

smart verification

确认In the same way , find 智能验证提示框the position in the html code, and copy the xpath of the corresponding button, (the xpath may change, if the following code cannot be used, use the above method to copy the new xpath to replace the one in the following code)

# 模拟点击智能验证按钮
# 先点确认
browser.find_element(By.XPATH,'//*[@id="layui-layer1"]/div[3]/a').click()
time.sleep(1)
# 再点智能验证提示框，进行智能验证
browser.find_element_by_xpath("//div[@id='captcha']").click()

slider validation

from selenium.webdriver import ActionChains
def get_track(distance):  # distance为传入的总距离
    # 移动轨迹
    track = []
    # 当前位移
    current = 0
    # 计算间隔
    t = 0.2
    # 初速度
    v = 0
    while current < distance:
        # 加速度
        a = 100 + current*random.random()
        v0 = v
        # 当前速度
        v = v0 + a * t
        # 移动距离
        # move = v0 * t + 1 / 2 * a * t * t
        move = v0 * t + a * t
        # 当前位移
        current += move
        # 加入轨迹
        track.append(round(move))
    return track  # track列表 返回的是整个滑动条的多个焦点，可以模拟鼠标的缓慢滑动

def move_to_gap(driver,slider, tracks):  # slider是要移动的滑块,tracks是要传入的移动轨迹
    ActionChains(driver).click_and_hold(slider).perform()
    for x in tracks:
        ActionChains(driver).move_by_offset(xoffset=x, yoffset=0).perform()
    time.sleep(0.1)
    ActionChains(driver).release().perform()

The above get_track function returns the track list (which contains the moving track), define the current displacement current=0, the time interval t=0.2, and the initial velocity v=0, and then judge whether the current distance is less than the one you entered by using the judgment statement If the total distance is established, the displacement is calculated through the knowledge of physics, acceleration a, and velocity v, and then added to current.

我改了一下原作者的代码，将加速度的大小随着current大小不规则变化，因为我发现固定加速度会使滑块验证多一个刷新重新验证，随机的加速度来滑动保证每次滑动都不太一样来防止检测出来

move_to_gap function

①ActionChains(driver).click_and_hold(slider).perform()中

click_and_hold(slider)---点击鼠标左键，不松开，其中slider为需要定位要移动的滑块
（例如huakuai = driver.find_element_by_css_selector('#nc_1_n1z')）
perform()---执行该动作；

② Next traverse the tracks

ActionChains(driver).move_by_offset(xoffset=x, yoffset=0).perform()中

move_by_offset(xoffset=x, yoffset=0)---鼠标向右移动x的px

③ActionChains(driver).release().perform()中

release()---释放

Use it in the main function, put it behind smart verification, and finally close the browser

try:
    huakuai = browser.find_element_by_css_selector('#nc_1_n1z')
    move_to_gap(browser,huakuai, get_track(328))
    time.sleep(2)
except:
    pass
finally:
    browser.quit() # 关闭浏览器

5. Complete code example

Specific questionnaires need to be modified accordingly

import random          # 用于产生随机数
import time            # 用于延时
from selenium.webdriver.common.by import By      #导入By包进行元素定位
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import ActionChains
#实例化一个启动参数对象
chrome_options = Options()
 
#添加启动参数
chrome_options.add_argument(
    'user-agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36"')  # 添加请求头
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
 
# 防止被识别
chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])     #设置开发者模式启动
 chrome_options.add_experimental_option('useAutomationExtension', False)    # 关闭selenium对chrome driver的自动控制
 
# chrome_options.maximize_window()      # 网页最大化
 
#chrome_options.add_argument('headless')    #设置浏览器以无界面方式运行
def get_track(distance):  # distance为传入的总距离
    # 移动轨迹
    track = []
    # 当前位移
    current = 0
    # 计算间隔
    t = 0.2
    # 初速度
    v = 0
    while current < distance:
        # 加速度
        a = 100 + current*random.random()
        v0 = v
        # 当前速度
        v = v0 + a * t
        # 移动距离
        # move = v0 * t + 1 / 2 * a * t * t
        move = v0 * t + a * t
        # 当前位移
        current += move
        # 加入轨迹
        track.append(round(move))
    return track  # track列表 返回的是整个滑动条的多个焦点，可以模拟鼠标的缓慢滑动

def move_to_gap(driver,slider, tracks):  # slider是要移动的滑块,tracks是要传入的移动轨迹
    ActionChains(driver).click_and_hold(slider).perform()
    for x in tracks:
        ActionChains(driver).move_by_offset(xoffset=x, yoffset=0).perform()
    time.sleep(0.1)
    ActionChains(driver).release().perform()

num = 120
man = int(num*0.06)
woman = num - man
for epoch in range(num):
    browser = webdriver.Chrome(options=chrome_options)     #设置驱动程序，启动浏览器  （实现以特定参数启动）
    browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument',
                        {
    
    'source': 'Object.defineProperty(navigator, "webdriver", {get: () => undefined})'})       #用来执行Chrome开发这个工具命令
    browser.get('https://www.wjx.cn/xx/xxx.aspx')        # 获取问卷信息(此处填问卷链接)
    # 问题1的点击 （性别）
    if man>0 and woman>0:
        sex = random.randint(1, 2)       # 随机点击第一个选项或第二个选项
        if sex == 1:
            man -= 1
        else:
            woman -= 1
    elif man>0 and woman==0:
        sex = 1
        man -= 1
    elif man==0 and woman>0:
        sex = 2
        woman -= 1
    else:
        break

    #js实现方式
    js = "document.getElementById(\"q1_" + str(sex) + "\").checked = true"
    browser.execute_script(js)         #使用js实现点击的效果（调用js方法，同时执行javascript脚本）
    js = "document.getElementById(\"q1_" + str(sex) + "\").click()"
    browser.execute_script(js)         #使用js实现点击的效果（调用js方法，同时执行javascript脚本）
 
    # 延时 太快会被检测是脚本
    time.sleep(1)
 
 
    # 问题2    （年龄）
    if sex == 1:
        seq = [2, 3]
        weights = [0.7,0.3]
    elif sex == 2:
        seq = [1, 2, 3, 4, 5]
        weights = [0.35,0.45,0.1,0.09,0.01]
    age = random.choices(seq,weights)[0]
    # js实现方式
    js = "document.getElementById(\"q2_" + str(age) + "\").checked = true"
    browser.execute_script(js)
    js = "document.getElementById(\"q2_" + str(age) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
    browser.execute_script(js)
    # 延时
    time.sleep(0.1)

    # 问题3    教龄
    if age == 1:
        teachyear = 1
    elif age == 2:
        teachyear = random.randint(1,2)
    elif age == 3:
        teachyear = random.randint(2,4)
    elif age == 4:
        teachyear = random.randint(3,4)
    elif age == 5:
        teachyear = 4
    # js实现方式
    js = "document.getElementById(\"q3_" + str(teachyear) + "\").checked = true"
    browser.execute_script(js)
    js = "document.getElementById(\"q3_" + str(teachyear) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
    browser.execute_script(js)
    # 延时
    time.sleep(0.1)

    # 问题3    学历
    if age == 1 or 2:
        seq = [2,3,4]
        weights = [0.4,0.5,0.1]
        edu = random.choices(seq,weights)[0]
    elif age == 3 or 4:
        seq = [3, 4]
        weights = [0.2,0.8]
        edu = random.choices(seq,weights)[0]
    elif age == 5:
        edu = 4
    # js实现方式
    js = "document.getElementById(\"q4_" + str(edu) + "\").checked = true"
    browser.execute_script(js)
    js = "document.getElementById(\"q4_" + str(edu) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
    browser.execute_script(js)
    # 延时
    time.sleep(0.1)

    # 问题3    婚育
    if age == 1:
        seq = [2,3]
        weights = [0.15,0.85]
        marry = random.choices(seq,weights)[0]
    elif age == 2:
        seq = [1,2,3]
        weights = [0.4,0.35,0.25]
        marry = random.choices(seq,weights)[0]
    elif age == 3 or 4:
        seq = [1,2,3]
        weights = [0.9,0.09,0.01]
        marry = random.choices(seq,weights)[0]
    elif age == 5:
        marry = 1
    # js实现方式
    js = "document.getElementById(\"q5_" + str(marry) + "\").checked = true"
    browser.execute_script(js)
    js = "document.getElementById(\"q5_" + str(marry) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
    browser.execute_script(js)
    # 延时
    time.sleep(0.1)

    for i in range(6,32):
        if i == 8:
            if edu == 2:
                seq = [2,3,4]
                weights = [0.4,0.35,0.15]
            else:
                seq = [2,3,4,5]
                weights = [0.1,0.25,0.4,0.25]
        elif i == 21:
            if teachyear in [3,4]:
                seq = [2,3,4,5]
                weights = [0.1,0.2,0.3,0.4]
            elif teachyear == 2:
                seq = [1,2,3,4,5]
                weights = [0.1,0.2,0.5,0.2,0.1]
            else:
                seq = [1,2,3,4,5]
                weights = [0.4,0.3,0.2,0.07,0.03]
        elif teachyear in [3,4]:
            seq = [1,2,3,4]
            weights = [0.1,0.45,0.35,0.1]
        elif teachyear == 2:
            seq = [2,3,4,5]
            weights = [0.1,0.35,0.45,0.1]
        elif teachyear == 1:
            seq = [3,4,5]
            weights = [0.25,0.4,0.35]

        pressure = random.choices(seq,weights)[0]

        # js实现方式
        js = "document.getElementById(\"q"+str(i)+"_" + str(pressure) + "\").checked = true"
        browser.execute_script(js)
        js = "document.getElementById(\"q"+str(i)+"_" + str(pressure) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
        browser.execute_script(js)
        # 延时
        time.sleep(0.1)

    for i in range(32,42):
        seq = [2,3,4,5]
        weights = [0.05,0.15,0.5,0.3]
        pressure = random.choices(seq,weights)[0]
        # js实现方式
        js = "document.getElementById(\"q"+str(i)+"_" + str(pressure) + "\").checked = true"
        browser.execute_script(js)
        js = "document.getElementById(\"q"+str(i)+"_" + str(pressure) + "\").click()"    # 拼接字符串的方式 js找到对应id 点击按钮
        browser.execute_script(js)
        # 延时
        time.sleep(0.1)

    #点击提交
    submit = browser.find_element_by_xpath("//*[@id='ctlNext']")    #网页源代码的xpath
    submit.click()      #点击

    #延时 太快会被检测是脚本
    time.sleep(0.5)
    
    # 模拟点击智能验证按钮
    # 先点确认
    browser.find_element(By.XPATH,'//*[@id="layui-layer1"]/div[3]/a').click()
    time.sleep(1)
    # 再点智能验证提示框，进行智能验证
    browser.find_element_by_xpath("//div[@id='captcha']").click()
    time.sleep(4)
    try:
        huakuai = browser.find_element_by_css_selector('#nc_1_n1z')
        move_to_gap(browser,huakuai, get_track(328))
        time.sleep(2)
    except:
        pass
    finally:
        print("No.{} Finished".format(epoch))
        print("man has {}, woman has {}".format(man,woman))
        browser.quit()