How to judge whether the crawler agent is used successfully?

Many friends who use proxy IP for the first time will have such doubts: After configuring the proxy on the crawler or collection software, how to judge whether the proxy IP has been used successfully? Apocalypse IP tells you!Insert picture description here

In fact, you can use the proxy IP to search for Baidu IP in the browser or visit https://www.ip138.com with the help of such websites that query the IP address. This type of website will obtain the source IP of the HTTP request and return it through HTTP content. If the proxy IP is forwarded successfully, the content returned by the target website should be the proxy IP address. There are generally the following situations:

1. Direct forwarding by proxy

Just copy the returned IP address into Baidu IP or https://www.ip138.com to check, if it is the IP address of the crawler server, it means the proxy IP forwarding failed, otherwise it means the proxy IP forwarding succeeded.

2. The proxy forwards multiple times

The content returned by the query IP address website is neither the IP address of the crawler server nor the proxy IP address directly used by the crawler or collection software, but the proxy IP address of the website is finally requested after being forwarded by the proxy IP multiple times.

3. Proxy automatic forwarding

Some proxy IP products will automatically assign different proxy IPs for forwarding according to each HTTP request of the crawler or collection software, and the IP address obtained by each query is different, so that a breakthrough can be made in the process of data collection. The IP restriction behavior of the target website. It should be noted that there will be a more complicated situation. Some IP query websites will directly cache the returned content according to cookies and other information. Even if the proxy IP forwards a different IP for each request, this type of IP query website will also return duplicate IP addresses. , Causing the illusion that the proxy IP automatic forwarding fails, the following demo example is provided as follows:

package main

    import (

        "net/url"

        "net/http"

        "bytes"

        "fmt"

        "io/ioutil"

    )

    // 代理服务器(产品官网 www.16yun.cn)

    const ProxyServer = "t.16yun.cn:31111"

    type ProxyAuth struct {

        Username string

        Password string

    }

    func (p ProxyAuth) ProxyClient() http.Client {

        var proxyURL *url.URL

        if p.Username != ""&& p.Password!="" {

            proxyURL, _ = url.Parse("http://" + p.Username + ":" + p.Password + "@" + ProxyServer)

        }else{

            proxyURL, _ = url.Parse("http://" + ProxyServer)

        }

        return http.Client{Transport: &http.Transport{Proxy:http.ProxyURL(proxyURL)}}

    }

    func main()  {

        targetURI := "https://httpbin.org/ip"

        // 初始化 proxy http client

        client := ProxyAuth{"username",  "password"}.ProxyClient()

        request, _ := http.NewRequest("GET", targetURI, bytes.NewBuffer([] byte(``)))

        // 设置Proxy-Tunnel

        // rand.Seed(time.Now().UnixNano())

        // tunnel := rand.Intn(10000)

        // request.Header.Set("Proxy-Tunnel", strconv.Itoa(tunnel) )

        response, err := client.Do(request)

        if err != nil {

            panic("failed to connect: " + err.Error())

        } else {

            bodyByte, err := ioutil.ReadAll(response.Body)

            if err != nil {

                fmt.Println("读取 Body 时出错", err)

                return

            }

            response.Body.Close()

            body := string(bodyByte)

            fmt.Println("Response Status:", response.Status)

            fmt.Println("Response Header:", response.Header)

            fmt.Println("Response Body:\n", body)

        }

    }

Guess you like

Origin blog.csdn.net/tianqiIP/article/details/112981173