python crawler learning 24
content
Fourth, the use of httpx
4-1. Introduction
So far, we have learned the use of the urllib library, the use of the requests library, and the related content of regular expressions. Here we have to mention the limitations of the urllib library and the requests library: both only support HTTP/1.1 and do not support HTTP/2.0. Once we hit a site that only supports HTTP/2.0 it's dead.
# 此网站就是一个强制使用HTTP/2.0的网站
url = 'https://spa16.scrape.center/'
import requests
resp = requests.get(url)
print(resp.text)
operation result:
In order to deal with this dilemma, we have to use a tool that can support HTTP2.0 - httpx library
4.2 Installation of httpx library
The old way, first install a wave
pip install httpx # 先安装 httpx 库
pip3 install httpx[http2] # 然后安装 httpx对HTTP2.0的支持模块
4.3 Basic use
The usage of the httpx library is very similar to requests:
import httpx
url = 'https://www.baidu.com'
resp = httpx.get(url)
print(resp.status_code)
print(resp.headers)
print(resp.text)
operation result:
it is good! Very spirited! Let me try the connection I couldn't connect before:
import httpx
url = 'https://spa16.scrap.center/'
resp = httpx.get(url)
print(resp.status_code)
print(resp.headers)
print(resp.text)
operation result:
What happened? Why doesn't it work?
Don't worry, in fact, httpx will not enable support for HTTP2.0 by default. At this time, we need to manually enable it:
import httpx
url = 'https://spa16.scrape.center/'
c = httpx.Client(http2=True)
resp = c.get(url)
print(resp.text)
operation result:
As mentioned earlier, the httpx library is very similar to the requests library, so similarly, the httpx library also has post(), put(), delete(), patch() and other methods, you can try it yourself.
I'm definitely not being lazy!
Ends today, continues tomorrow