Main content: The impact of GET and POST on the crawled web page status and common web page status codes
Table of contents
Distinguish web page request GET or POST
Distinguish web page request GET or POST
GET, request the specified page information and return the entity body.
POST, submits data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is all included in the request body.
Features of the get method
get is actively cached, it is safe and idempotent
Features of the post method
Of course, for our reptiles, these concepts are not important, the important thing is:
Depending on the request, the information is delivered in a different way, especially cookies.
Web page return status code
200
Of course, the normal web page returns 200:
non 200
It is also possible to return some status codes other than 200, pay attention! The non-200 status may also be accessible, but most non-200 statuses represent exceptions and your access request is rejected.
Status Code Status Code | Is it possible to access | meaning |
200 | can visit | (success) The server has successfully processed the request. Usually, this means that the server served the requested web page. |
202 | can visit | (accepted) The server has accepted the request but has not yet processed it. |
203 | can visit | (non-authorized information) The server successfully processed the request, but the information returned may have come from another source. |
300 | Inaccessible | (multiple choices) In response to a request, the server can perform various operations. The server can choose an operation according to the requester (user agent), or provide a list of operations for the requester to choose. |
301 | Inaccessible | (moved permanently) The requested webpage has been permanently moved to a new location. When the server returns this response (in response to a GET or HEAD request), it automatically forwards the requester to the new location. |
302 | Inaccessible | (temporary move) The server is currently responding to requests from pages in a different location, but the requester should continue to use the original location for future requests. |
400 | Inaccessible | (bad request) The server did not understand the syntax of the request. |
401 | Inaccessible | (unauthorized) The request requires authentication. The server might return this response for web pages that require a login. |
403 | Inaccessible | (prohibit) The server rejected the request. |
404 | Inaccessible | (not found) The server could not find the requested web page. |
406 | Inaccessible | (not accepted) Unable to respond to the requested webpage with the requested content attributes. |
500 | Inaccessible | (internal server error) The server encountered an error and could not complete the request. |
502 | Inaccessible | (bad gateway) 服务器作为网关或代理,从上游服务器收到无效响应。 |
503 | 无法访问 | (服务不可用) 服务器目前无法使用(由于超载或停机维护)。 一般只是暂时状态。 |
当出现401、403这些状态的时候,就去检查你的headers或者cookies吧。
request中的headers和cookies的作用、如何设置headers或者cookies、什么时候可以不加cookies、GET或POST请求的区别请见上文:
原创不易,转载标明出处。