Java runtime environment settings selenium-free interface chrome linux environment

Foreword

"Java is used selenium and chrome browser download dynamic page" in one article, we demonstrate how to download dynamic web window environment by selenium and chrome. But our crawlers are generally run on linux server. Generally there is no GUI on the server environment. Unable to open chrome window interface. The previous time, crawler system is PhantomJS a non-browser interface to achieve. But now because FireFox, chrome after these browsers began to support the headless mode, PhantomJS have stopped updating, so now recommended to use FireFox and chrome headless mode to replace the PhantomJS. The so-called headless mode is no interface operating mode, just right for use in such situations without a GUI environment linux server.

The use of selenium drivers need to be installed chrome browser chrome and chrome webdriver in the linux environment. The following shows should do in centos 7 environment.

Install google chrome

First address from https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm download the offline installation package. Then execute the following command to install the required dependencies chrome

yum install libX11 libXcursor libXdamage libXext libXcomposite libXi libXrandr gtk3 libappindicator-gtk3 xdg-utils libXScrnSaver liberation-fonts

Then execute the command to install chrome

rpm -ivh google-chrome-stable_current_x86_64.rpm

Execute the following command to view the complete version

[root@localhost ~]# google-chrome --version
Google Chrome 70.0.3538.110 

The current version can be seen as a version 70

Install chrome webdriver

And the article "Java use selenium and chrome browser download dynamic page" as in, find support for version 70 of chrome download, download linux platform version of the file can be chromedriver_linux64.zip

 
web driver on different platforms

Headless mode selenium chrome caller sample

Or in the "Java use selenium and chrome browser download dynamic page" program is based on the transformation of his headless mode

        WebDriver webDriver = null;
        try {
            String url = "https://www.jianshu.com/p/675ea919230e";
            ChromeOptions chromeOptions=new ChromeOptions(); //设置 chrome 的无头模式 chromeOptions.setHeadless(Boolean.TRUE); //启动一个 chrome 实例 webDriver = new ChromeDriver(chromeOptions); //访问网址 webDriver.get(url); Document document = Jsoup.parse(webDriver.getPageSource()); Element titleElement = document.selectFirst("div.article h1.title"); Element authorElement = document.selectFirst("div.article div.author span.name"); Element timeElement = document.selectFirst("div.article span.publish-time"); Element wordCountElement = document.selectFirst("div.article span.wordage"); Element viewCountElement = document.selectFirst("div.article span.views-count"); Element commentCountElement = document.selectFirst("div.article span.comments-count"); Element likeCountElement = document.selectFirst("div.article span.likes-count"); Element contentElement = document.selectFirst("div.article div.show-content"); if (titleElement != null) { System.out.println("标题:" + titleElement.text()); } if (authorElement != null) { System.out.println("作者:" + authorElement.text()); } if (timeElement != null) { System.out.println("发布时间:" + timeElement.text()); } if (wordCountElement != null) { System.out.println(wordCountElement.text()); } if (viewCountElement != null) { System.out.println(viewCountElement.text()); } if (commentCountElement != null) { System.out.println(commentCountElement.text()); } if (likeCountElement != null) { System.out.println(likeCountElement.text()); } if (contentElement != null && contentElement.text() != null) { System.out.println("正文长度:" + contentElement.text().length()); } } catch (Exception e) { e.printStackTrace(); } finally { if (webDriver != null) { //退出 chrome webDriver.quit(); } } 

And compared to the pre-text of the code, the following is not the same place

            ChromeOptions chromeOptions=new ChromeOptions();
            //设置 chrome 的无头模式
            chromeOptions.setHeadless(Boolean.TRUE);
            //启动一个 chrome 实例
            webDriver = new ChromeDriver(chromeOptions); 

This parameter determines whether to start with a headless mode
will be packaged and uploaded to the linux server, execute the command

java -jar -Dwebdriver.chrome.driver=/data/deploy/chromedriver spider_demo-0.0.1-SNAPSHOT.jar 

The console will print out the following content

标题:是什么支撑了淘宝双十一,没错就是它java编程语言。
作者:Java帮帮
发布时间:2018.08.29 14:49 字数 561 阅读 632 评论 0 喜欢 4 正文长度:655 

Description in linux call chrome visit this page by selenium success with the java. If no endless mode parameters above, it will be executed when the following prompt

org.openqa.selenium.WebDriverException: unknown error: Chrome failed to start: exited abnormally
  (unknown error: DevToolsActivePort file doesn't exist)
  (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
  (Driver info: chromedriver=2.44.609551 (5d576e9a44fe4c5b6a07e568f1ebc753f1214634),platform=Linux 3.10.0-514.26.2.el7.x86_64 x86_64) (WARNING: The server did not provide any stacktrace information)
Command duration or timeout: 399 milliseconds
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'iz2ze9kvzy03hms75m3jzlz', ip: '172.17.251.3', os.name: 'Linux', os.arch: 'amd64', os.version: '3.10.0-514.26.2.el7.x86_64', java.version: '1.8.0_171' Driver info: driver.version: ChromeDriver at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:214) at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:166) at org.openqa.selenium.remote.JsonWireProtocolResponse.lambda$new$0(JsonWireProtocolResponse.java:53) at org.openqa.selenium.remote.JsonWireProtocolResponse.lambda$getResponseFunction$2(JsonWireProtocolResponse.java:91) at org.openqa.selenium.remote.ProtocolHandshake.lambda$createSession$0(ProtocolHandshake.java:122) at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) at java.util.Spliterators$ArraySpliterator.tryAdvance(Spliterators.java:958) at java.util.stream.ReferencePipeline.forEachWithCancel(ReferencePipeline.java:126) at java.util.stream.AbstractPipeline.copyIntoWithCancel(AbstractPipeline.java:498) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:485) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.FindOps$FindOp.evaluateSequential(FindOps.java:152) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.findFirst(ReferencePipeline.java:464) at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:125) at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:73) at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136) at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83) at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:548) at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:212) at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:130) at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:181) at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:168) at org.openqa.selenium.chrome.ChromeDriver.<init>(ChromeDriver.java:123) at com.yanggaochao.spider.SpiderDemoApplication.run(SpiderDemoApplication.java:34) at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:813) at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:797) at org.springframework.boot.SpringApplication.run(SpringApplication.java:324) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1260) at org.springframework.boot.SpringApplication.run(SpringApplication.java:1248) at com.yanggaochao.spider.SpiderDemoApplication.main(SpiderDemoApplication.java:21) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51) 

In this way, we can at our crawler system which uses the browser to download the page, WYSIWYG effect Realization. Never worry about dynamic rendering of Web content can not be downloaded.



Author: Tuu not my
link: https: //www.jianshu.com/p/b2609ed57f07
Source: Jane book
Jane book copyright reserved by the authors, are reproduced in any form, please contact the author to obtain authorization and indicate the source.

Guess you like

Origin www.cnblogs.com/ppp1314520818/p/11300348.html