版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_15144655/article/details/53419788
在maven项目里添加所需的开源包,这里我使用2.23版本
<dependency>
<groupId>net.sourceforge.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>2.23</version>
</dependency>
htmlunit的基本设置,实现百度高级搜索:
public static String Baidu(String keyword)throws Exception{
WebClient webclient = new WebClient();
//ssl认证
//webclient.getOptions().setUseInsecureSSL(true);
//由于有的网页js书写不规范htmlunit会报错,所以去除这种错误让程序执行完全(不影响结果)
webclient.getOptions().setThrowExceptionOnScriptError(false);
webclient.getOptions().setThrowExceptionOnFailingStatusCode(false);
//不加载css
webclient.getOptions().setCssEnabled(false);
//由于是动态网页所以一定要加载js及执行
webclient.getOptions().setJavaScriptEnabled(true);
//打开百度高级搜索的网址
HtmlPage htmlpage = webclient.getPage("http://www.baidu.com/gaoji/advanced.html");
//获取网页from控件(f1为控件name)
HtmlForm form = htmlpage.getFormByName("f1");
HtmlSubmitInput button = form.getInputByValue("百度一下");
HtmlTextInput textField = form.getInputByName("q1");
textField.setValueAttribute(keyword);
final HtmlSelect htmlSelet=form.getSelectByName("rn");
htmlSelet.setDefaultValue("10");
//隐藏值
final HtmlHiddenInput hiddenInputtn = form.getInputByName("tn");
hiddenInputtn.setDefaultValue("baiduadv");
//发送请求(相当于点击百度一下按钮)获取返回后的网页
final HtmlPage page = button.click();
//获取网页的文本信息
String result = page.asText();
//获取网页源码
//String result = page.asXml();
//System.out.println(result);
webclient.close();
return result;
}
程序所对应的网页源码: