Preface
In our daily work, sometimes we need to crawl the data on the website, but the anti-crawling mechanism of some websites will determine whether we are opening the browser using webdriver. Once it is determined that we are opening the browser using webdriver, If this happens, we will be unable to capture the data we want, or we will not be able to use the browser opened by webdriver to log in to the website we want to log in to, but playwright provides a way for us to set up and escape this An anti-crawling mechanism.
window.navigator.webdriver
Attributes
In most cases, websites use this attribute to determine whether we use webdriver
to open the browser. If we open the browser manually, then the attribute is false
, as shown below:
But the browser opened with webdriver will display as true
shown below:
Normally, we only need to be able to webdriver
modify the properties of the opened browser false
to bypass this anti-crawling detection.
Set up using Playwrightwindow.navigator.webdriver
Let's introduce how to set window.navigator.webdriver
this property. Suppose we want to visit a website. The code is as follows:
import asyncio
from playwright.async_api import async_playwright
async def set_navigator_property():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
# 设置 window.navigator.webdriver 属性为 false
await page.evaluate('''() => {
Object.defineProperty(navigator, 'webdriver', {
get: () => false
});
}''')
await page.goto('https://example.com') # 替换为目标网站的 URL
await asyncio.sleep(10) # 这里可以等待页面加载完毕后再继续操作
# 在此处可以执行你的爬取操作
await browser.close()
if __name__ == "__main__":
asyncio.run(set_navigator_property())
In this example, we first import Playwright
and then use async_playwright
to create a browser instance. On the browser page, we use page.evaluate
the method to set window.navigator.webdrive
the r attribute to false
. Next, we page.goto
open the target website via the method and perform our crawling operation after the page is loaded.
Note: Setting window.navigator.webdriver
the attribute to false may bypass the anti-crawling detection of some websites, but not all websites will rely on this attribute to detect automated programs.
Summarize
Playwright is a powerful tool that can help you bypass website anti-crawling detection and perform automated website crawling operations. But please be sure to use it with caution and comply with the laws and website regulations.