Using data taken climb HttpClient-Post
1, the web crawler Method Description:
In fact, our web crawler just like people use a web browser to access the routine is the same, are divided into four parts:
- Open a browser: create HttpClient object.
- Enter the URL: Creating initiate Post request HttpPost create objects.
- Enter: initiating a request, it returns a response, sending a request using httpClient.
- Parse the response acquisition request: determining whether the status code is 200, 200 if access is successful.
- Finally, close the response and httpClient
2, the project's directory structure
See How to Create a Project: https://blog.csdn.net/weixin_44588495/article/details/90580722
3, no parameters fetch climb
package com.crawler;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import java.io.IOException;
public class HttpClientPostTest {
public static void main(String[] args) throws Exception {
//1、打开浏览器,创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//2、输入网址,创建发起Get请求,创建HttpGet对象
HttpPost httpPost = new HttpPost("http://www.itcast.cn");
CloseableHttpResponse response = null;
try {
//3、按回车,发起请求,返回响应,使用httpClient发送请求
response = httpClient.execute(httpPost);
//4、解析响应获取请求,判断状态码是否是200
if(response.getStatusLine().getStatusCode() == 200){
HttpEntity httpEntity = response.getEntity();
String content = EntityUtils.toString(httpEntity,"utf-8");
System.out.println(content.length());
}
} catch (IOException e) {
e.printStackTrace();
}
response.close();
httpClient.close();
}
}
operation result:
4, with the parameter fetch climb
As the Post requests as expressed in the address bar like Get request, so the need to install a List parameter. Also need to create a form object, puts the argument in the form object, and then add HttpPost form object to object.
package com.crawler;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class HttpClientPostParamTest {
public static void main(String[] args) throws Exception {
//1、打开浏览器,创建HttpClient对象
CloseableHttpClient httpClient = HttpClients.createDefault();
//2、输入网址,创建发起Get请求,创建HttpGet对象
HttpPost httpPost = new HttpPost("http://yun.itheima.com/course");
//3、声明List集合,封装表单中的参数
List<NameValuePair> params = new ArrayList<NameValuePair>();
//设置请求地址是:http://yun.itheima.com/course?keys=java
params.add(new BasicNameValuePair("keys","java"));
//创建表单的Emtity对象,第一个参数是封装好的表单数据,第二个参数是编码
UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(params,"utf-8");
//设置表单中的Emptity对象到Post请求中
httpPost.setEntity(formEntity);
CloseableHttpResponse response = null;
try {
//3、按回车,发起请求,返回响应,使用httpClient发送请求
response = httpClient.execute(httpPost);
//4、解析响应获取请求,判断状态码是否是200
if(response.getStatusLine().getStatusCode() == 200){
HttpEntity httpEntity = response.getEntity();
String content = EntityUtils.toString(httpEntity,"utf-8");
System.out.println(content.length());
}
} catch (IOException e) {
e.printStackTrace();
}
response.close();
httpClient.close();
}
}
The result:
Tips: The output data reptile with an article on different output is the length of crawling data.