1. 7 types of http protocol requests:
serial number | method | describe |
---|---|---|
1 | GET | Request the specified page information and return the entity body. |
2 | HEAD | Similar to the get request, except that there is no specific content in the returned response, which is used to get the header |
3 | POST | Submit data to the specified resource for processing requests (such as submitting a form or uploading a file), and the data is included in the request body. POST requests may result in the creation of new resources and/or the modification of existing resources. |
4 | PUT | Data transferred from the client to the server replaces the contents of the specified document. |
5 | DELETE | Requests the server to delete the specified page. |
6 | CONNECT | The HTTP/1.1 protocol is reserved for proxy servers that can change connections to pipes. |
7 | OPTIONS | Allows clients to view server performance. |
2.GET request:
Here we implement the crawler to automatically query Baidu for the result of the keyword hello:
import urllib.request keywd = "haha" url = "http://www.baidu.com/s?wd="+keywd req = urllib.request.Request(url) data = urllib.request.urlopen(req).read() fhandle = open("D:\\pythoncode\\pachong\\sizhou\\httpdemo.html",'wb') fhandle.write(data) fhandle.close()
One problem to note here is that if the query is Chinese, it needs to be encoded:
import urllib.request # keywd = "haha" keywd = "Guo Chang" keywd_code = urllib.request.quote(keywd) url = "http://www.baidu.com/s?wd="+keywd_code req = urllib.request.Request(url) data = urllib.request.urlopen(req).read() fhandle = open("D:\\pythoncode\\pachong\\sizhou\\httpdemo.html",'wb') fhandle.write(data) fhandle.close()
3.post request:
Later, learn about automatic crawler login and write it in detail!