Web crawlers of java HttpClient-Post

Using data taken climb HttpClient-Post

1, the web crawler Method Description:

In fact, our web crawler just like people use a web browser to access the routine is the same, are divided into four parts:

  1. Open a browser: create HttpClient object.
  2. Enter the URL: Creating initiate Post request HttpPost create objects.
  3. Enter: initiating a request, it returns a response, sending a request using httpClient.
  4. Parse the response acquisition request: determining whether the status code is 200, 200 if access is successful.
  5. Finally, close the response and httpClient

2, the project's directory structure

See How to Create a Project: https://blog.csdn.net/weixin_44588495/article/details/90580722
Here Insert Picture Description

3, no parameters fetch climb

package com.crawler;

import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;

public class HttpClientPostTest {
    public static void main(String[] args) throws Exception {
        //1、打开浏览器,创建HttpClient对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //2、输入网址,创建发起Get请求,创建HttpGet对象
        HttpPost httpPost = new HttpPost("http://www.itcast.cn");

        CloseableHttpResponse response = null;
        try {
            //3、按回车,发起请求,返回响应,使用httpClient发送请求
            response = httpClient.execute(httpPost);
            //4、解析响应获取请求,判断状态码是否是200
            if(response.getStatusLine().getStatusCode() == 200){
                HttpEntity httpEntity = response.getEntity();
                String content = EntityUtils.toString(httpEntity,"utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        response.close();
        httpClient.close();



    }
}

operation result:
Here Insert Picture Description

4, with the parameter fetch climb

As the Post requests as expressed in the address bar like Get request, so the need to install a List parameter. Also need to create a form object, puts the argument in the form object, and then add HttpPost form object to object.

package com.crawler;

import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class HttpClientPostParamTest {
    public static void main(String[] args) throws Exception {
        //1、打开浏览器,创建HttpClient对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        //2、输入网址,创建发起Get请求,创建HttpGet对象
        HttpPost httpPost = new HttpPost("http://yun.itheima.com/course");
        //3、声明List集合,封装表单中的参数
        List<NameValuePair> params = new ArrayList<NameValuePair>();
        //设置请求地址是:http://yun.itheima.com/course?keys=java
        params.add(new BasicNameValuePair("keys","java"));
        //创建表单的Emtity对象,第一个参数是封装好的表单数据,第二个参数是编码
        UrlEncodedFormEntity formEntity = new UrlEncodedFormEntity(params,"utf-8");
        //设置表单中的Emptity对象到Post请求中
        httpPost.setEntity(formEntity);

        CloseableHttpResponse response = null;
        try {
            //3、按回车,发起请求,返回响应,使用httpClient发送请求
            response = httpClient.execute(httpPost);
            //4、解析响应获取请求,判断状态码是否是200
            if(response.getStatusLine().getStatusCode() == 200){
                HttpEntity httpEntity = response.getEntity();
                String content = EntityUtils.toString(httpEntity,"utf-8");
                System.out.println(content.length());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        response.close();
        httpClient.close();
    }
}

The result:
Here Insert Picture Description
Tips: The output data reptile with an article on different output is the length of crawling data.

Guess you like

Origin blog.csdn.net/weixin_44588495/article/details/90581101