Environment construction
1 Maven依赖 2 <dependency> 3 <groupId>net.sourceforge.htmlunit</groupId> 4 <artifactId>htmlunit</artifactId> 5 <version>2.15</version> 6 </dependency>
1. Basic use
1 final WebClient webClient=new WebClient();//Create object 2 final HtmlPage page=webClient.getPage("https://www.baidu.com");//Get page 3 System.out.println(page.asText ());//asText() As the name implies, get all text 4 webClient.closeAllWindows();//Close the window
1 List<HtmlAnchor> achList=page.getAnchors(); 2 for(HtmlAnchor ach:achList){ 3 System.out.println(ach.getHrefAttribute()); 4 }
1.HtmlUnit 's support for Javascript is not very good
2.HtmlUnit's support for CSS is not very good so let's modify it,
1 final WebClient webClient=new WebClient(); 2 webClient.getOptions().setCssEnabled(false);//关闭css 3 webClient.getOptions().setJavaScriptEnabled(false);//关闭js 4 final HtmlPage page=webClient.getPage("https://www.baidu.com"); 5 System.out.println(page.asText()); 6 webClient.closeAllWindows();
1.1 Emulate a specific browser
1 // Simulate the chorme browser, please modify the BrowserVersion.xxx constants for other browsers 2 WebClient webClient= new WebClient(BrowserVersion.CHROME);