The use of selenium crawlers in firefox and edge browsers

We know that in the chrome browser, it is not difficult to do anti-crawling,

    driver = webdriver.Chrome(executable_path=r"C:\ProgramData\Anaconda3\chromedriver.exe") 
 # -----------------------------------------------修改chromedriver.exe的路径---------------------------

# 过网站检测,没加这句的话,账号密码登录时滑动验证码过不了,但二维码登录不受影响
driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", 
    {"source": """Object.defineProperty(navigator, 'webdriver', 
    {get: () => undefined})"""})

Even one sentence is enough, or you can also engage in user-agent

headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'
    }
driver.execute_cdp_cmd('Network.setUserAgentOverride', headers=headers, userAgent='Mozilla/5.0')

However, in firefox and edge browsers,

    driver = webdriver.Firefox(executable_path=r"C:\ProgramData\Anaconda3\geckodriver.exe")

Firefox and Edge don't have execute_cdp_cmd method! ! ! ! It can only be realized through the execute_script() method, and this is just equivalent to a method of running js code in the console.

execute_cdp_cmdIt is a way provided by Selenium WebDriver to interact with the browser's Chrome DevTools Protocol (CDP). CDP is a set of APIs that can be used to obtain and manipulate various information and behaviors in the Chrome browser.

execute_cdp_cmdDevelopers using Selenium can send CDP commands directly in Python and get the execution results of the commands. In this way, we can control the behavior of the browser by writing Python code, such as taking screenshots or changing the network settings of the browser.

The difference with execute_script():

execute_scriptand execute_cdp_cmdare both methods provided by Selenium WebDriver for interacting with the browser, but there are some differences between them:

  1. The working principle is different: execute_scriptit is implemented by injecting JavaScript scripts into the browser, and the JavaScript engine is used to execute the scripts on the browser side. Instead execute_cdp_cmd, it communicates with the browser's Chrome DevTools Protocol, sends an HTTP request to the browser, and gets a response.

  2. The functions are different: execute_scriptany valid JavaScript code can be executed, including operations on elements on the page, DOM operations, listening events, etc. And execute_cdp_cmdis mainly used to perform low-level interaction with the browser, such as taking screenshots, simulating network requests, etc.

  3. The usage scenarios are different: since execute_cdp_cmdit is mainly used to perform low-level browser operations, in many cases, we may prefer to use the more specialized CDP library instead of using it directly execute_cdp_cmd. And execute_scriptthen it is a powerful tool for dealing with the JavaScript technology stack in the browser, so in many cases, we will use execute_scriptto access and manipulate page elements.

In short, both Selenium WebDriver execute_scriptand execute_cdp_cmdSelenium WebDriver are commonly used tools, but they have different purposes and usage scenarios. The appropriate method needs to be selected according to the specific situation.

Actually the main question is:

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {"source": """Object.defineProperty(navigator, 'webdriver', { get: () => undefined})"""})

and

driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

 What is the difference? Execute_cdp_cmd can realize real anti-crawling only by writing it once, while execute_script needs to set webdriver to undefined frequently! ! !

And execute_cdp_cmd can only be used on chrome browser! ! ! So if you want to achieve the same function in firefox and edge browsers, you need to frequently set webdriver to undefined! ! ! !



So an example of anti-crawling under the firefoc and edge engines:
 

            driver.get(url)

            # driver.execute_script("")
            driver.maximize_window()
            driver.get(url)
            # 最多等待5秒使页面加载进来,隐式等待
            driver.implicitly_wait(5)


            # 获取并点击右上角登录按钮
            login = driver.find_element(by=By.ID, value='J-btn-login')
            login.click()
            driver.implicitly_wait(5)

            # driver.execute_script(
            #     '''
            #         navigator.userAgent = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36'
            #     ''')
            # driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined}); \
            #                        window.navigator.chrome = {runtime: {}, loadTimes: function() {}}; \
            #                        Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']}); \
            #                        Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});")
            # 账号密码登录
            username_tag = driver.find_element(by=By.ID, value='J-userName')
            username_tag.send_keys(conf.username)
            password_tag = driver.find_element(by=By.ID, value='J-password')
            password_tag.send_keys(conf.password)
            login_now = driver.find_element(by=By.ID, value='J-login')
            login_now.click()
            time.sleep(2)

            # 为了完成验证码的验证,把webdriver的设置写在这里才行!!!不可以写在刚进入get(url)的地方,要写在离过验证码近的地方!!!!
            driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") 


            # 过滑动验证码
            # while True:
            picture_start = driver.find_element(by=By.ID, value='nc_1_n1z')
            # 移动到相应的位置,并左键鼠标按住往右边拖
            ActionChains(driver).move_to_element(picture_start).click_and_hold(picture_start).move_by_offset(300, 0).release().perform()
            # 如果出现验证失败 需要刷新的话 再次来一遍
            try:
                time.sleep(1.5)
                driver.find_element(by=By.XPATH, value='//*[@id="nc_1_refresh1"]').click()  # 点击刷新
                time.sleep(1)
                picture_start = driver.find_element(by=By.ID, value='nc_1_n1z')
                # 移动到相应的位置,并左键鼠标按住往右边拖
                ActionChains(driver).move_to_element(picture_start).click_and_hold(picture_start).move_by_offset(300,
                                                                                                                 0).release().perform()
            except:
                pass

            # In order to complete the verification of the verification code, write the settings of the webdriver here! ! ! It cannot be written in the place where you just entered get(url), but should be written in a place close to the verification code! ! ! !
            driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})") 

The definitions of firefox and edge engines are written as follows:
 

    driver = webdriver.Firefox(executable_path=r"C:\ProgramData\Anaconda3\geckodriver.exe")

    driver = webdriver.Edge(executable_path=r"C:\ProgramData\Anaconda3\msedgedriver.exe")

Guess you like

Origin blog.csdn.net/conquer_galaxy/article/details/130668319