C# crawler-selenium detection method of webdriver to block crawlers

Background When
you use Selenium + Chromedriver to crawl website information python basic tutorial , you think that this will prevent you from being discovered by the website's anti-crawler mechanism. But in fact, many parameters are still different from the actual browser. As long as the website performs judgment processing, the c# tutorial can easily identify whether you use Selenium + Chromedriver to simulate the browser. among them

window.navigator.webdriver
is a very important one.

Problem snooping
Normal browser opens like this

Insert picture description here

Simulator opens vb.net tutorial

Is such that

ChromeOptions options = null;
            IWebDriver driver = null;
            try
            {
    
    
                options = new ChromeOptions();
                options.AddArguments("--ignore-certificate-errors");
                options.AddArguments("--ignore-ssl-errors");

                // options.AddExcludedArgument("enable-automation");
                //  options.AddAdditionalCapability("useAutomationExtension", false);

                var listCookie = CookieHelp.GetCookie();
                if (listCookie != null)
                {
    
    
                    // options.AddArgument("headless");
                }

                // string ss = @"{ ""source"": ""Object.defineProperty(navigator, 'webdriver', { get: () => undefined})""}";
                //   options.AddUserProfilePreference("Page.addScriptToEvaluateOnNewDocument", new ssss() { source = " Object.defineProperty(navigator, 'webdriver', {   get: () => undefined  }) " });

                ChromeDriverService service = ChromeDriverService.CreateDefaultService(System.Environment.CurrentDirectory);
                service.HideCommandPromptWindow = true;
                driver = new ChromeDriver(service, options, TimeSpan.FromSeconds(120));

                session.Page.AddScriptToEvaluateOnNewDocument(new OpenQA.Selenium.DevTools.Page.AddScriptToEvaluateOnNewDocumentCommandSettings()
                {
    
    
                    Source = @"Object.defineProperty(navigator, 'webdriver', { get: () => undefined })"
                }
                 );

Insert picture description here

Therefore, if the website obtains this parameter through the js code, the return value is undefined, which means it is a normal browser, and the return true means that the Selenium simulation browser is used.

Solution
So in this case, how to prevent this parameter from telling the website that you are simulating a browser during the crawler development process? Execute the corresponding js and change its value.

 IJavaScriptExecutor js = (IJavaScriptExecutor)driver;
  string returnjs = (string)js.ExecuteScript("Object.defineProperties(navigator, {webdriver:{get:()=>undefined}});");

running result
Insert picture description here

Guess you like

Origin blog.csdn.net/chinaherolts2008/article/details/112852791