HTTP protocol request

1. 7 types of http protocol requests:

serial number method describe
1 GET Request the specified page information and return the entity body.
2 HEAD Similar to the get request, except that there is no specific content in the returned response, which is used to get the header
3 POST Submit data to the specified resource for processing requests (such as submitting a form or uploading a file), and the data is included in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources.
4 PUT Data transferred from the client to the server replaces the contents of the specified document.
5 DELETE Requests the server to delete the specified page.
6 CONNECT The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipes.
7 OPTIONS Allows clients to view server performance.

2.GET request:

Here we implement the crawler to automatically query Baidu for the result of the keyword hello:

import urllib.request
keywd = "haha"

url = "http://www.baidu.com/s?wd="+keywd
req = urllib.request.Request(url)
data = urllib.request.urlopen(req).read()
fhandle  = open("D:\\pythoncode\\pachong\\sizhou\\httpdemo.html",'wb')
fhandle.write(data)
fhandle.close()

One problem to note here is that if the query is Chinese, it needs to be encoded:

import urllib.request
# keywd = "haha"
keywd = "Guo Chang"
keywd_code = urllib.request.quote(keywd)

url = "http://www.baidu.com/s?wd="+keywd_code
req = urllib.request.Request(url)
data = urllib.request.urlopen(req).read()
fhandle  = open("D:\\pythoncode\\pachong\\sizhou\\httpdemo.html",'wb')
fhandle.write(data)
fhandle.close()

3.post request:

Later, learn about automatic crawler login and write it in detail!


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325561723&siteId=291194637