Software Testing | Web Automation Testing Artifact Playwright Tutorial (Thirty-nine)

Insert image description here

Preface

In our daily work, sometimes we need to crawl the data on the website, but the anti-crawling mechanism of some websites will determine whether we are opening the browser using webdriver. Once it is determined that we are opening the browser using webdriver, If this happens, we will be unable to capture the data we want, or we will not be able to use the browser opened by webdriver to log in to the website we want to log in to, but playwright provides a way for us to set up and escape this An anti-crawling mechanism.

window.navigator.webdriverAttributes

In most cases, websites use this attribute to determine whether we use webdriverto open the browser. If we open the browser manually, then the attribute is false, as shown below:

Insert image description here

But the browser opened with webdriver will display as trueshown below:

Insert image description here

Normally, we only need to be able to webdrivermodify the properties of the opened browser falseto bypass this anti-crawling detection.

Set up using Playwrightwindow.navigator.webdriver

Let's introduce how to set window.navigator.webdriverthis property. Suppose we want to visit a website. The code is as follows:

import asyncio
from playwright.async_api import async_playwright

async def set_navigator_property():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()

        # 设置 window.navigator.webdriver 属性为 false
        await page.evaluate('''() => {
            Object.defineProperty(navigator, 'webdriver', {
                get: () => false
            });
        }''')

        await page.goto('https://example.com')  # 替换为目标网站的 URL
        await asyncio.sleep(10)  # 这里可以等待页面加载完毕后再继续操作

        # 在此处可以执行你的爬取操作

        await browser.close()

if __name__ == "__main__":
    asyncio.run(set_navigator_property())

In this example, we first import Playwrightand then use async_playwrightto create a browser instance. On the browser page, we use page.evaluatethe method to set window.navigator.webdrivethe r attribute to false. Next, we page.gotoopen the target website via the method and perform our crawling operation after the page is loaded.

Note: Setting window.navigator.webdriverthe attribute to false may bypass the anti-crawling detection of some websites, but not all websites will rely on this attribute to detect automated programs.

Summarize

Playwright is a powerful tool that can help you bypass website anti-crawling detection and perform automated website crawling operations. But please be sure to use it with caution and comply with the laws and website regulations.

Guess you like

Origin blog.csdn.net/Tester_muller/article/details/133083626