Table of contents
1.5 Advanced Users (Cookies, Proxies)
1 request request introduction
11 SSL certificate authentication-verify parameter
1 Matching rules for commonly used regular expressions
Regular expression - only keep Chinese/Hanzi characters (filter non-Chinese characters
4 Support asynchronous requests
5. Detailed explanation of usage and parameters of logging.basicConfig of Python logging module
1.1. Introduction to logging module
2 logging.basicConfig(**kwargs)
1.4 Use file (filename) to save log files
1.5 Set the time format in the log
1.7 Method of reading json files (json. load)
3 Common methods of os library
2.2 Commonly used attributes of sys
9. Examples of crawling static web pages
9 Attribute multi-value matching
3 Basic uses of Beautiful Soup
5.6 Parent nodes and ancestor nodes
8.1 add_class and remove_class methods
1. urllib
1. Introduction to request
Request is the most basic HTTP request module. It can simulate sending a request. The process is the same as entering URL 1 in the browser and pressing Enter. As long as the URL and additional parameters are passed to the library method, the process of sending a request can be simulated. .
1.1 holidays
The rullib.request module can simulate the browser's request initiation process, and also has functions such as processing authorization verification (authentication), redirection (redirection), and browser cookies.
The basic writing method is as follows:
import urllib.request
response = urllib.request.urlopen("https://www.python.org/")
print(response.read().decode('utf-8'))
This method is a GET request method. Use the type method to get the type of response:
print(type(response))
Output: <class 'http.client.HTTPResponse'>
So the response is an object of type HTTPResponse.
Use the method to output the response status code and response header information:
print(response.status) #得到响应的状态码
print(response.getheaders()) #得到响应的响应头信息
print(response.getheader("Server")) #获取响应头的键为Server的值
API usage of urlopen:
response = urllib.request.Request(url, data = None, [timeout]*,cafile = None,capath = None,cadefault = False,context = None)
1.2 data parameter design
The data parameter is optional. When adding this parameter, you need to use the bytes method to convert the parameter into content in the byte stream encoding format, that is, the bytes type. If the data parameter is passed, the requester method is GET instead of POST.
Example:
import urllib.request
import urllib.parse
data = bytes(urllib.parse.urlencode({'name':'germey'}), encoding = 'utf-8')
response = urllib.request.urlopen('https://www.httpbin.org/post', data = data)
print(response.read().decode('utf-8'))
got the answer:
{ "args": {}, "data": "", "files": {}, "form": { "name": "germey" }, "headers": { "Accept-Encoding": "identity", "Content-Length": "11", "Content-Type": "application/x-www-form-urlencoded", "Host": "www.httpbin.org", "User-Agent": "Python-urllib/3.11", "X-Amzn-Trace-Id": "Root=1-64997bdd-011711375cc64ba54dba4056" }, "json": null, "origin": "1.202.187.118", "ur