python crawler learning 24

content

- python crawler learning 24
- - Fourth, the use of httpx

Fourth, the use of httpx

4-1. Introduction

So far, we have learned the use of the urllib library, the use of the requests library, and the related content of regular expressions. Here we have to mention the limitations of the urllib library and the requests library: both only support HTTP/1.1 and do not support HTTP/2.0. Once we hit a site that only supports HTTP/2.0 it's dead.

# 此网站就是一个强制使用HTTP/2.0的网站
url = 'https://spa16.scrape.center/'

import requests

resp = requests.get(url)
print(resp.text)

operation result:

insert image description here

In order to deal with this dilemma, we have to use a tool that can support HTTP2.0 - httpx library

4.2 Installation of httpx library

The old way, first install a wave

pip install httpx			# 先安装 httpx 库
pip3 install httpx[http2]	# 然后安装 httpx对HTTP2.0的支持模块

4.3 Basic use

The usage of the httpx library is very similar to requests:

import httpx

url = 'https://www.baidu.com'

resp = httpx.get(url)
print(resp.status_code)
print(resp.headers)
print(resp.text)

operation result:

insert image description here

it is good! Very spirited! Let me try the connection I couldn't connect before:

import httpx

url = 'https://spa16.scrap.center/'

resp = httpx.get(url)
print(resp.status_code)
print(resp.headers)
print(resp.text)

operation result:

insert image description here

What happened? Why doesn't it work?

Don't worry, in fact, httpx will not enable support for HTTP2.0 by default. At this time, we need to manually enable it:

import httpx

url = 'https://spa16.scrape.center/'
c = httpx.Client(http2=True)
resp = c.get(url)
print(resp.text)

operation result:

insert image description here

As mentioned earlier, the httpx library is very similar to the requests library, so similarly, the httpx library also has post(), put(), delete(), patch() and other methods, you can try it yourself.

I'm definitely not being lazy!

Ends today, continues tomorrow