Refactor SeleniumeDownloader underlying browser driver


1. Solve bugs: Selenium with PhantomJS, reconstruct the underlying browser driver of SeleniumeDownloader

0. Small background:

I want to crawl external steam data, but the official steam website is overseas. In addition, steam is considering anti-crawling and processes asynchronous data - json data. As a result, if I directly use other people's ajax interface as the request url to crawl, it will be crawled. Got a bunch of garbled useless data. —Solution: Use Selenium to simulate users using the browser (rendered through js), and then parse and process the data downloaded by the selenium downloader.
But at the beginning, the bottom layer of selenium in the project used phantomjs as the driver (browser), and the following series appeared:

1. Bug screenshots

  • Variable xxx not found

  • Error message:
[ERROR - 2023-03-07T12:00:29.232Z] Session [258893b0-bcdf-11ed-9fc2-3f99c08d4ed8] - page.onError - msg: ReferenceError: Can't find variable: InitMiniprofileHovers

  phantomjs://platform/console++.js:263 in error
[ERROR - 2023-03-07T12:00:29.233Z] Session [258893b0-bcdf-11ed-9fc2-3f99c08d4ed8] - page.onError - stack:
  global code (https://store.steampowered.com/charts/topselling/SG:671)

  phantomjs://platform/console++.js:263 in error
[ERROR - 2023-03-07T12:00:30.743Z] Session [258893b0-bcdf-11ed-9fc2-3f99c08d4ed8] - page.onError - msg: ReferenceError: Can't find variable: WebStorage

  phantomjs://platform/console++.js:263 in error
[ERROR - 2023-03-07T12:00:30.744Z] Session [258893b0-bcdf-11ed-9fc2-3f99c08d4ed8] - page.onError - stack:
  (anonymous function) (https://store.st.dl.eccdnx.com/public/shared/javascript/shared_responsive_adapter.js?v=TNYlyRmh1mUl&l=schinese&_cdn=china_eccdnx:43)
  l (https://store.st.dl.eccdnx.com/public/shared/javascript/jquery-1.8.3.min.js?v=.TZ2NKhB-nliU&_cdn=china_eccdnx:2)
  fireWith (https://store.st.dl.eccdnx.com/public/shared/javascript/jquery-1.8.3.min.js?v=.TZ2NKhB-nliU&_cdn=china_eccdnx:2)
  ready (https://store.st.dl.eccdnx.com/public/shared/javascript/jquery-1.8.3.min.js?v=.TZ2NKhB-nliU&_cdn=china_eccdnx:2)
  A (https://store.st.dl.eccdnx.com/public/shared/javascript/jquery-1.8.3.min.js?v=.TZ2NKhB-nliU&_cdn=china_eccdnx:2)

  phantomjs://platform/console++.js:263 in error
[ERROR - 2023-03-07T12:00:30.746Z] Session [258893b0-bcdf-11ed-9fc2-3f99c08d4ed8] - page.onError - msg: ReferenceError: Can't find variable: GetNavCookie

2. The page to be crawled has this variable: InitMiniprofileHovers, GetNavCookie

3. Debugging-core steps

  • Breakpoint entry
Page page = downloader.download(request, this);//爬虫任务的下载器,开始下载页面
  • SeleniumDownloader
//获取到web驱动器
webDriver = webDriverPool.get();
//驱动器下载页面
webDriver.get(request.getUrl());//这里出错

▪ Case of webDriver variable:

  • RemoteWebDriver
 this.execute("get", ImmutableMap.of("url", url));//执行下载命令

response = this.executor.execute(command);//响应体,即执行命令后的结果
//command 只是一个封装了sessionId, driverCommand-get, 请求参数url的对象
  • PhantomJSCommandExecutor
Response var2 = super.execute(command);


4. Analyze the cause of the error:

Reason for the error: The design of phantomis is not reasonable enough: when the DOM element cannot be found on the page, a reasonable design should return nul instead of throwing an exception.

The reason for the netizen’s error – encryption method, reason: The encryption method used by PhantomJS is SSLv3, and some websites use TLS.

Workaround for encryption issues: --ignore-ssl-errors=true and --ssl-protocol=any

▷ Web driver/browser in your own project (excluding reasons for encryption):

5. Tips:

phantomjs has inherent pitfalls in supporting ES6. Websites that use ES6 on the front end are not recommended to run with phantomis.

6. Solution: Use chrome instead of PhantomJS

7. New problem: Chrome is unstable when parsing the external network.

  • Solution—vpn
  • Now the idea becomes that when Selenium calls the browser chrome, open a vpn, and the browsers integrated into Selenium by default are ordinary and pure browsers.

I found that Microsoft's browser Edge opens the Steam official website smoothly without opening a VPN. However, if the Steam link contains a geographical location, such as Hong Kong, it cannot be opened. Solution: VPN



2. Rewrite Selenium’s browser-the purpose is to add a proxy

1. Basic idea: first clarify the business logic

It was found that after the project called the scheduler of the crawler framework, the downloader started to work.

case CHROME:
                    if (isWindows) {
    
    
                        System.setProperty("selenuim_config", "C:\\data\\config\\config-chrome.ini");
                        SeleniumDownloader seleniumDownloader = new SeleniumDownloader("C:\\data\\config\\chromedriver.exe");
                        // 浏览器打开10s后才开始爬取数据
                        seleniumDownloader.setSleepTime(10 * 1000);
                        autoSpider.setDownloader(seleniumDownloader);
                    }

In business, we used the downloader that created SeleniumDownloader to download the page, but it was determined that the underlying browser was a pure and ordinary version of the browser.

I saw that when the business created the downloader of SeleniumDownloader, it injected a configuration file config-chrome.ini into it.


2. Personal solution 1: Consider injecting the agent’s options through this configuration file.

But I found that this configuration file is a startup file, and there are no options attributes that can be configured.

The configuration of the startup file cannot be implemented


3. Personal solution 2: Take a look at the browser driver pool WebDriverPool underlying SeleniumDownloader's downloader

Are there any attributes exposed to the outside world that can be used to configure options? After reading the source code, I found that it only exposes one attribute, which is the configuration startup file config-chrome.ini.

public void configure() throws IOException {
    
    
		// Read config file
		sConfig = new Properties();
		String configFile = DEFAULT_CONFIG_FILE;
		if (System.getProperty("selenuim_config")!=null){
    
    
			configFile = System.getProperty("selenuim_config");
		}
		sConfig.load(new FileReader(configFile));

		// Prepare capabilities
		sCaps = new DesiredCapabilities();
		sCaps.setJavascriptEnabled(true);
		sCaps.setCapability("takesScreenshot", false);

		String driver = sConfig.getProperty("driver", DRIVER_PHANTOMJS);

		// Fetch PhantomJS-specific configuration parameters
		......
}

4. Personal solution 3: Rewrite the underlying browser driver pool WebDriverPool, and then rewrite a downloader that calls the WebDriverPool

The downloader and drive management pool are modified based on the source code provided on the official website;

SeleniumDownloader2: Based on SeleniumDownloader, a new proxy enumeration attribute proxyEnum is added, and the browser driver pool WebDriverPool2 rewritten by itself is used.

WebDriverPool2: Rewritten the constructor of WebDriverPool, and rewritten the configure method to initialize the WebDriver instance (the purpose is to add options such as agents)

  • Of course, a polling method incrForLoop is also added, the purpose is to obtain the index of the agent list

■ WebDriverPool2:

  • Use an ellipsis to indicate that the agent and the official website are exactly the same!

  • Details: ChromeOptions needs to set the SSL protocol (the demo provided on the official website did not add it, which caused me to fail to open the VPN and there was no prompt...)

    Analysis and solution: Because https=http+ssl/tls, when we access through the browser, the browser will process all URL addresses into secure communication protocols, so the SSL protocol needs to be configured in the code

public class WebDriverPool2 {
    
    
	......
    /** 代理枚举参数 */
    private final ProxyEnum proxyEnum;
    /** 代理列表 */
    private List<String> proxies;
    /** ip代理列表的索引 */
    private final AtomicInteger pointer = new AtomicInteger(-1);
	......
        
    /**
     * 初始化一个 WebDriver 实例
     * @throws IOException 异常
     */
    public void configure() throws IOException {
    
    
       ......
        if (isUrl(driver)) {
    
    
            sCaps.setBrowserName("phantomjs");
            mDriver = new RemoteWebDriver(new URL(driver), sCaps);
        } else if (driver.equals(DRIVER_FIREFOX)) {
    
    
            mDriver = new FirefoxDriver(sCaps);
        } else if (driver.equals(DRIVER_CHROME)) {
    
    
            if(proxyEnum == ProxyEnum.VPN_ENABLE || proxyEnum == ProxyEnum.PROXY_ENABLE){
    
    
                //给谷歌浏览器,添加上ip代理或vpn等options
                ChromeOptions options = new ChromeOptions();
                //禁止加载图片
                options.addArguments("blink-settings=imagesEnabled=false");
                Proxy proxy = new Proxy();
                String httpProxy = proxies.get(incrForLoop());
                // 需要设置ssl协议
                proxy.setHttpProxy(httpProxy).setSslProxy(httpProxy);
                options.setCapability("proxy",proxy);
                sCaps.setCapability(ChromeOptions.CAPABILITY, options);
                logger.info("chrome webDriver proxy is : " + proxy);
            }
            mDriver = new ChromeDriver(sCaps);
        } else if (driver.equals(DRIVER_PHANTOMJS)) {
    
    
            mDriver = new PhantomJSDriver(sCaps);
        }
    }


    /**
     * 轮询:从代理列表选出一个代理的索引
     * @return 索引
     */
    private int incrForLoop() {
    
    
        int p = pointer.incrementAndGet();
        int size = proxies.size();
        if (p < size) {
    
    
            return p;
        }
        while (!pointer.compareAndSet(p, p % size)) {
    
    
            p = pointer.get();
        }
        return p % size;
    }

    public WebDriverPool2(int capacity, ProxyEnum proxyEnum, MasterWebservice masterWebservice) {
    
    
        this.capacity = capacity;
        //设置代理的情况
        this.proxyEnum = proxyEnum;
        //vpn的情况
        if(proxyEnum == ProxyEnum.VPN_ENABLE){
    
    
            this.proxies = masterWebservice.getVpn();
        //ip代理的情况
        }else if(proxyEnum == ProxyEnum.PROXY_ENABLE){
    
    
            //获取动态生成的ip列表,带有端口的,参数形式举例 42.177.155.5:75114
            this.proxies = masterWebservice.getProxyIps();
        }
    }

}

■ SeleniumDownloader2:

/**
 * 在SeleniumDownloader基础上新增了代理枚举属性proxyEnum
 * 并且要把官网SeleniumDownloader代码中使用WebDriverPool(实际是使用上咱改写的WebDriverPool2)的方法引入,还有使用到WebDriverPool的方法
 * 中,需要的属性,要注意父类中被设置私有,需要重写一下(从父类copy到子类就行啦)
 */
public class SeleniumDownloader2 extends SeleniumDownloader {
    
    
    private volatile WebDriverPool2 webDriverPool;
    
    /** 代理枚举参数 */
    private ProxyEnum proxyEnum;
    /** 通过masterWebservice获得远程的动态ip列表 */
    private MasterWebservice masterWebservice;
    
    public SeleniumDownloader2(String chromeDriverPath, ProxyEnum proxyEnum, MasterWebservice masterWebservice) {
    
    
        System.getProperties().setProperty("webdriver.chrome.driver",
                chromeDriverPath);
        this.proxyEnum = proxyEnum;
        this.masterWebservice = masterWebservice;
    }
    
    ......
}

■ The downloaders and browsers under the seleniume package are as follows:

■ WebDriverPool provided by the official website:

package us.codecraft.webmagic.downloader.selenium;

import org.apache.log4j.Logger;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.firefox.FirefoxDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriver;
import org.openqa.selenium.phantomjs.PhantomJSDriverService;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;

import java.io.FileReader;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Properties;
import java.util.concurrent.BlockingDeque;
import java.util.concurrent.LinkedBlockingDeque;
import java.util.concurrent.atomic.AtomicInteger;

/**
 * @author [email protected] <br>
 *         Date: 13-7-26 <br>
 *         Time: 下午1:41 <br>
 */
class WebDriverPool {
    
    
	private Logger logger = Logger.getLogger(getClass());

	private final static int DEFAULT_CAPACITY = 5;

	private final int capacity;

	private final static int STAT_RUNNING = 1;

	private final static int STAT_CLODED = 2;

	private AtomicInteger stat = new AtomicInteger(STAT_RUNNING);

	/*
	 * new fields for configuring phantomJS
	 */
	private WebDriver mDriver = null;
	private boolean mAutoQuitDriver = true;

	private static final String DEFAULT_CONFIG_FILE = "/data/webmagic/webmagic-selenium/config.ini";
	private static final String DRIVER_FIREFOX = "firefox";
	private static final String DRIVER_CHROME = "chrome";
	private static final String DRIVER_PHANTOMJS = "phantomjs";

	protected static Properties sConfig;
	protected static DesiredCapabilities sCaps;

	/**
	 * Configure the GhostDriver, and initialize a WebDriver instance. This part
	 * of code comes from GhostDriver.
	 * https://github.com/detro/ghostdriver/tree/master/test/java/src/test/java/ghostdriver
	 * 
	 * @author [email protected]
	 * @throws IOException
	 */
	public void configure() throws IOException {
    
    
		// Read config file
		sConfig = new Properties();
		String configFile = DEFAULT_CONFIG_FILE;
		if (System.getProperty("selenuim_config")!=null){
    
    
			configFile = System.getProperty("selenuim_config");
		}
		sConfig.load(new FileReader(configFile));

		// Prepare capabilities
		sCaps = new DesiredCapabilities();
		sCaps.setJavascriptEnabled(true);
		sCaps.setCapability("takesScreenshot", false);

		String driver = sConfig.getProperty("driver", DRIVER_PHANTOMJS);

		// Fetch PhantomJS-specific configuration parameters
		if (driver.equals(DRIVER_PHANTOMJS)) {
    
    
			// "phantomjs_exec_path"
			if (sConfig.getProperty("phantomjs_exec_path") != null) {
    
    
				sCaps.setCapability(
						PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
						sConfig.getProperty("phantomjs_exec_path"));
			} else {
    
    
				throw new IOException(
						String.format(
								"Property '%s' not set!",
								PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY));
			}
			// "phantomjs_driver_path"
			if (sConfig.getProperty("phantomjs_driver_path") != null) {
    
    
				System.out.println("Test will use an external GhostDriver");
				sCaps.setCapability(
						PhantomJSDriverService.PHANTOMJS_GHOSTDRIVER_PATH_PROPERTY,
						sConfig.getProperty("phantomjs_driver_path"));
			} else {
    
    
				System.out
						.println("Test will use PhantomJS internal GhostDriver");
			}
		}

		// Disable "web-security", enable all possible "ssl-protocols" and
		// "ignore-ssl-errors" for PhantomJSDriver
		// sCaps.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, new
		// String[] {
    
    
		// "--web-security=false",
		// "--ssl-protocol=any",
		// "--ignore-ssl-errors=true"
		// });

		ArrayList<String> cliArgsCap = new ArrayList<String>();
		cliArgsCap.add("--web-security=false");
		cliArgsCap.add("--ssl-protocol=any");
		cliArgsCap.add("--ignore-ssl-errors=true");
		sCaps.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS,
				cliArgsCap);

		// Control LogLevel for GhostDriver, via CLI arguments
		sCaps.setCapability(
				PhantomJSDriverService.PHANTOMJS_GHOSTDRIVER_CLI_ARGS,
				new String[] {
    
     "--logLevel="
						+ (sConfig.getProperty("phantomjs_driver_loglevel") != null ? sConfig
								.getProperty("phantomjs_driver_loglevel")
								: "INFO") });

		// String driver = sConfig.getProperty("driver", DRIVER_PHANTOMJS);

		// Start appropriate Driver
		if (isUrl(driver)) {
    
    
			sCaps.setBrowserName("phantomjs");
			mDriver = new RemoteWebDriver(new URL(driver), sCaps);
		} else if (driver.equals(DRIVER_FIREFOX)) {
    
    
			mDriver = new FirefoxDriver(sCaps);
		} else if (driver.equals(DRIVER_CHROME)) {
    
    
			mDriver = new ChromeDriver(sCaps);
		} else if (driver.equals(DRIVER_PHANTOMJS)) {
    
    
			mDriver = new PhantomJSDriver(sCaps);
		}
	}

	/**
	 * check whether input is a valid URL
	 * 
	 * @author [email protected]
	 * @param urlString urlString
	 * @return true means yes, otherwise no.
	 */
	private boolean isUrl(String urlString) {
    
    
		try {
    
    
			new URL(urlString);
			return true;
		} catch (MalformedURLException mue) {
    
    
			return false;
		}
	}

	/**
	 * store webDrivers created
	 */
	private List<WebDriver> webDriverList = Collections
			.synchronizedList(new ArrayList<WebDriver>());

	/**
	 * store webDrivers available
	 */
	private BlockingDeque<WebDriver> innerQueue = new LinkedBlockingDeque<WebDriver>();

	public WebDriverPool(int capacity) {
    
    
		this.capacity = capacity;
	}

	public WebDriverPool() {
    
    
		this(DEFAULT_CAPACITY);
	}

	/**
	 * 
	 * @return
	 * @throws InterruptedException
	 */
	public WebDriver get() throws InterruptedException {
    
    
		checkRunning();
		WebDriver poll = innerQueue.poll();
		if (poll != null) {
    
    
			return poll;
		}
		if (webDriverList.size() < capacity) {
    
    
			synchronized (webDriverList) {
    
    
				if (webDriverList.size() < capacity) {
    
    

					// add new WebDriver instance into pool
					try {
    
    
						configure();
						innerQueue.add(mDriver);
						webDriverList.add(mDriver);
					} catch (IOException e) {
    
    
						e.printStackTrace();
					}

					// ChromeDriver e = new ChromeDriver();
					// WebDriver e = getWebDriver();
					// innerQueue.add(e);
					// webDriverList.add(e);
				}
			}

		}
		return innerQueue.take();
	}

	public void returnToPool(WebDriver webDriver) {
    
    
		checkRunning();
		innerQueue.add(webDriver);
	}

	protected void checkRunning() {
    
    
		if (!stat.compareAndSet(STAT_RUNNING, STAT_RUNNING)) {
    
    
			throw new IllegalStateException("Already closed!");
		}
	}

	public void closeAll() {
    
    
		boolean b = stat.compareAndSet(STAT_RUNNING, STAT_CLODED);
		if (!b) {
    
    
			throw new IllegalStateException("Already closed!");
		}
		for (WebDriver webDriver : webDriverList) {
    
    
			logger.info("Quit webDriver" + webDriver);
			webDriver.quit();
			webDriver = null;
		}
	}

}

■ SeleniumDownloader provided by the official website:

package us.codecraft.webmagic.downloader.selenium;

import org.apache.log4j.Logger;
import org.openqa.selenium.By;
import org.openqa.selenium.Cookie;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import us.codecraft.webmagic.Page;
import us.codecraft.webmagic.Request;
import us.codecraft.webmagic.Site;
import us.codecraft.webmagic.Task;
import us.codecraft.webmagic.downloader.Downloader;
import us.codecraft.webmagic.selector.Html;
import us.codecraft.webmagic.selector.PlainText;

import java.io.Closeable;
import java.io.IOException;
import java.util.Map;

/**
 * 使用Selenium调用浏览器进行渲染。目前仅支持chrome。<br>
 * 需要下载Selenium driver支持。<br>
 *
 * @author [email protected] <br>
 *         Date: 13-7-26 <br>
 *         Time: 下午1:37 <br>
 */
public class SeleniumDownloader implements Downloader, Closeable {
    
    

	private volatile WebDriverPool webDriverPool;

	private Logger logger = Logger.getLogger(getClass());

	private int sleepTime = 0;

	private int poolSize = 1;

	private static final String DRIVER_PHANTOMJS = "phantomjs";

	/**
	 * 新建
	 *
	 * @param chromeDriverPath chromeDriverPath
	 */
	public SeleniumDownloader(String chromeDriverPath) {
    
    
		System.getProperties().setProperty("webdriver.chrome.driver",
				chromeDriverPath);
	}

	/**
	 * Constructor without any filed. Construct PhantomJS browser
	 * 
	 * @author [email protected]
	 */
	public SeleniumDownloader() {
    
    
		// System.setProperty("phantomjs.binary.path",
		// "/Users/Bingo/Downloads/phantomjs-1.9.7-macosx/bin/phantomjs");
	}

	/**
	 * set sleep time to wait until load success
	 *
	 * @param sleepTime sleepTime
	 * @return this
	 */
	public SeleniumDownloader setSleepTime(int sleepTime) {
    
    
		this.sleepTime = sleepTime;
		return this;
	}

	@Override
	public Page download(Request request, Task task) {
    
    
		checkInit();
		WebDriver webDriver;
		try {
    
    
			webDriver = webDriverPool.get();
		} catch (InterruptedException e) {
    
    
			logger.warn("interrupted", e);
			return null;
		}
		logger.info("downloading page " + request.getUrl());
		webDriver.get(request.getUrl());
		try {
    
    
			Thread.sleep(sleepTime);
		} catch (InterruptedException e) {
    
    
			e.printStackTrace();
		}
		WebDriver.Options manage = webDriver.manage();
		Site site = task.getSite();
		if (site.getCookies() != null) {
    
    
			for (Map.Entry<String, String> cookieEntry : site.getCookies()
					.entrySet()) {
    
    
				Cookie cookie = new Cookie(cookieEntry.getKey(),
						cookieEntry.getValue());
				manage.addCookie(cookie);
			}
		}

		/*
		 * TODO You can add mouse event or other processes
		 * 
		 * @author: [email protected]
		 */

		WebElement webElement = webDriver.findElement(By.xpath("/html"));
		String content = webElement.getAttribute("outerHTML");
		Page page = new Page();
		page.setRawText(content);
		page.setHtml(new Html(content, request.getUrl()));
		page.setUrl(new PlainText(request.getUrl()));
		page.setRequest(request);
		webDriverPool.returnToPool(webDriver);
		return page;
	}

	private void checkInit() {
    
    
		if (webDriverPool == null) {
    
    
			synchronized (this) {
    
    
				webDriverPool = new WebDriverPool(poolSize);
			}
		}
	}

	@Override
	public void setThread(int thread) {
    
    
		this.poolSize = thread;
	}

	@Override
	public void close() throws IOException {
    
    
		webDriverPool.closeAll();
	}
}



3. Introduction to Selenium

0. Official website reference materials:

  • ChromeDriver:https://sites.google.com/a/chromium.org/chromedriver/capabilities
  • Selenium:https://www.selenium.dev/documentation/

1. What is Selenium?

Selenium is an automated testing tool for the Web that can simulate user interaction with the browser to access the website.

Selenium is a large-scale browser automation project.

It provides extensions for simulating user interaction with the browser , a distribution server for extending browser distribution, and an infrastructure for implementing the W3C WebDriver specification, allowing you to write interchangeable code for all major Web browsers. The core of Selenium is WebDriver , which is an interface for writing instruction sets that can run interchangeably in many browsers.

2. Selenium functions:

Automated testing: Automated testing tools can simulate user interaction with browsers to access websites.

Crawler: Because Selenium can control the browser to send requests and obtain web page data, it can be applied to the crawler field.

Selenium can let the browser automatically load the page according to our instructions, obtain the required data, and even take screenshots of the page, or determine whether certain actions on the website have occurred.

3. Selenium actual situation

Selenium is a web automated testing tool, originally developed for website automation testing. Selenium can run directly on the browser and supports all major browsers.

Selenium does not have a browser of its own and does not support browser functions. It needs to be combined with a third-party browser to use it.

■Mainstream browser driver WebDriver: PhantomJS, chromedriver

▪ PhantomJS:

PhantomJS is a " headless " browser based on Webkit. It loads the website into memory and executes the JavaScript on the page. Because it does not display a graphical interface, it runs more efficiently than a full browser .

If we combine Selenium and PhantomJS, we can run a very powerful web crawler that can handle JavaScript, cookies, headers, and anything else our real users need to do.

▪ chromedriver:

Note: The version of chromedriver must correspond to the version of chrome you are using!

chromedriver版本	  支持的Chrome版本
v2.46				v71-73
v2.45				v70-72
v2.44				v69-71
v2.43				v69-71
v2.42				v68-70
v2.41				v67-69
v2.40				v66-68
v2.39				v66-68
v2.38				v65-67
v2.37				v64-66
v2.36				v63-65
v2.35				v62-64
v2.34				v61-63
v2.33				v60-62
v2.32				v59-61
v2.31				v58-60
v2.30				v58-60
v2.29				v56-58
v2.28				v55-57
v2.27				v54-56
v2.26				v53-55
v2.25				v53-55
v2.24				v52-54
v2.23				v51-53
v2.22				v49-52
v2.21				v46-50
v2.20				v43-48
v2.19				v43-47
v2.18				v43-46
v2.17				v42-43
v2.13				v42-45
v2.15				v40-43
v2.14				v39-42
v2.13				v38-41
v2.12				v36-40
v2.11				v36-40
v2.10				v33-36
v2.9				v31-34
v2.8				v30-33
v2.7				v30-33
v2.6				v29-32
v2.5				v29-32
v2.4				v29-32
  • Chromedriver version download link 1: http://chromedriver.storage.googleapis.com/index.html
  • Chromedriver version download link 2: https://registry.npmmirror.com/binary.html?path=chromedriver/

4. Use of Selenium+chromedriver:

(1) Preparation work:

Selenium: Import dependency packages

chromedriver: Look at the Google Chrome version of your computer and download the corresponding chromedriver driver package

(2) Use:

public class FirstScriptTest {
    
    

    @Test
    public void eightComponents() {
    
    
        //通过DesiredCapabilities、options 可以给driver 配置一个选项,例如代理,禁止加载图片、去掉界面模式等
        //参考:ChromeDriver:https://sites.google.com/a/chromium.org/chromedriver/capabilities
        String downloadsPath = "d:\\data\\downloads";
		HashMap<String, Object> chromePrefs = new HashMap<String, Object>();
		chromePrefs.put("download.default_directory", downloadsPath);
		ChromeOptions options = new ChromeOptions();
		Proxy proxy = new Proxy();
		// 需要增加设置ssl协议
		proxy.setHttpProxy(VpnServerUtils.getVpnServer()).setSslProxy(VpnServerUtils.getVpnServer());
//		proxy.setHttpProxy(VpnServerUtils.getVpnServer());
		options.setCapability("proxy",proxy);
		System.out.println("~~~~~~~~~~~~~~~~~proxy: " + proxy.getHttpProxy());
		options.setExperimentalOption("prefs", chromePrefs);
		DesiredCapabilities caps = new DesiredCapabilities();
		caps.setCapability(ChromeOptions.CAPABILITY, options);
        
        WebDriver driver = new ChromeDriver(caps);
        //浏览器驱动器请求加载页面
        driver.get("https://www.selenium.dev/selenium/web/web-form.html");
		
        //查找元素
        String title = driver.getTitle();
        assertEquals("Web form", title);

        driver.manage().timeouts().implicitlyWait(Duration.ofMillis(500));

        WebElement textBox = driver.findElement(By.name("my-text"));
        WebElement submitButton = driver.findElement(By.cssSelector("button"));

        textBox.sendKeys("Selenium");
        submitButton.click();//点击事件

        WebElement message = driver.findElement(By.id("message"));
        String value = message.getText();
        assertEquals("Received!", value);
	    //结束会话
        driver.quit();
    }
}




If this article is helpful to you, please remember to give Yile a like, thank you!

Guess you like

Origin blog.csdn.net/weixin_45630258/article/details/129444198