Python uses HTTP for data grabbing

Python can use the built-in urllib and third-party library requests for HTTP data fetching.

Sample code for HTTP data grabbing using urllib:

```python

import urllib.request

url = 'Example Domain'

response = urllib.request.urlopen(url)

html = response.read()

print(html)

```

Sample code for HTTP data grabbing using requests:

```python

import requests

url = 'Example Domain'

response = requests.get(url)

html = response.text

print(html)

```

It should be noted that when crawling HTTP data, you need to pay attention to the robots.txt file of the website and abide by the crawling rules of the website to avoid violating the law or being banned from IP. In addition, some websites may perform anti-crawler processing on crawlers, and some techniques need to be used to bypass the anti-crawler mechanism.

#! -*- encoding:utf-8 -*-

    import requests

    # 要访问的目标页面
    targetUrl = "http://ip.hahado.cn/ip"

    # 代理服务器
    proxyHost = "ip.hahado.cn"
    proxyPort = "39010"

    # 代理隧道验证信息
    proxyUser = "username"
    proxyPass = "password"

    proxyMeta = "http://%(user)s:%(pass)s@%(host)s:%(port)s" % {
        "host" : proxyHost,
        "port" : proxyPort,
        "user" : proxyUser,
        "pass" : proxyPass,
    }

    proxies = {
        "http"  : proxyMeta,
        "https" : proxyMeta,
    }

    resp = requests.get(targetUrl, proxies=proxies)

    print resp.status_code
    print resp.text

Guess you like

Origin blog.csdn.net/weixin_73725158/article/details/130696853