Application of Tunnel HTTP in Data Capture

Data capture usually uses the HTTP protocol for data transmission. HTTP is an application layer protocol used to transfer data between web browsers and web servers. It is a client-server protocol where the client sends a request to the server and the server returns a response to the client. HTTP uses TCP as the transport protocol, it is a stateless protocol, each request and response are independent.

Data capture can use the GET and POST methods in the HTTP protocol to obtain data. The GET method is used to get data from the server and the POST method is used to submit data to the server. While using the GET method, the data is passed through the URL whereas while using the POST method, the data is passed through the HTTP request body.

Data capture can also use HTTP headers to transfer information. For example, the User-Agent header can be used to simulate different browsers or devices, the Referer header can be used to indicate the source of the request, and the Cookie header can be used to transfer session information.

In summary, HTTP is one of the most commonly used protocols in data scraping, which provides a simple and efficient way to fetch and transfer data.

#! -*- encoding:utf-8 -*-

    import requests

    # 要访问的目标页面
    targetUrl = "http://ip.hahado.cn/ip"

    # 代理服务器
    proxyHost = "ip.hahado.cn"
    proxyPort = "39010"

    # 代理隧道验证信息
    proxyUser = "username"
    proxyPass = "password"

    proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,
    }

    proxies = {
        "http"  : proxyMeta,
        "https" : proxyMeta,
    }

    resp = requests.get(targetUrl, proxies=proxies)

    print resp.status_code
    print resp.text

Guess you like

Origin blog.csdn.net/weixin_73725158/article/details/130613712