Difference between GET and POST methods in HTTP protocol

The difference between the GET and POST methods in the HTTP protocol is a commonplace, and it is also a popular question in interviews. I only had a superficial impression of this before, so I will discuss it here.

common understanding

w3schools' answer to this question: HTTP Methods: GET vs. POST  lists general understandings:

method GET POST
back button/refresh harmless The data will be resubmitted (the browser should inform the user that the data will be resubmitted).
bookmark bookmarkable Not bookmarked
cache can be cached cannot be cached
encoding type application/x-www-form-urlencoded application/x-www-form-urlencoded or multipart/form-data. Use multiple encodings for binary data.
history Parameters are preserved in the browser history. Parameters are not saved in browser history.
restrictions on data length Yes. When sending data, the GET method adds data to the URL; the length of the URL is limited (the maximum URL length is 2048 characters). Unlimited.
Restrictions on data types Only ASCII characters are allowed. no limit. Binary data is also allowed.
safety GET is less secure than POST because the data sent is part of the URL. Never use GET when sending passwords or other sensitive information! POST is more secure than GET because parameters are not saved in browser history or web server logs.
visibility The data is visible to everyone in the URL. Data will not be displayed in the URL.

Later, a classmate pointed out that it is wrong to limit the length of the URL here. The HTTP protocol does not limit the length of the URI, and the specific length is constrained by the browser and the system.

This comparison only gives some phenomenological differences, but does not explain why, and the understanding of this problem cannot stop at this level.

Got it wrong?

There is an article in which 99% of people misunderstand the difference between GET and POST in HTTP , denying the above answer: "Unfortunately, this is not the answer we want!", the author said:

GET and POST are essentially TCP links, and there is no difference. However, due to HTTP regulations and browser/server limitations, they show some differences in the application process. There is another important difference between GET and POST. Simply put: GET generates one TCP packet; POST generates two TCP packets.
For GET requests, the browser will send the http header and data together, and the server responds with 200 (returning data); for POST, the browser sends the header first, the server responds with 100 continue, the browser sends the data, and the server responds 200 ok (return data).

I have already talked about TCP, and I feel that there is nothing in it. At least I believed it when I saw this article at the time.

Reverse? ?

But I saw this article when I was browsing Zhihu: I heard that "99% of people misunderstand the difference between GET and POST in HTTP"? ? , pointing out two errors in the previous article:

100 continue 只有在请求里带了Expect: 100-continueheader 的时候才有意义。
When the request contains an Expect header field that includes a 100-continue expectation, the 100 response indicates that the server wishes to receive the request payload body, as described in Section 5.1.1. The client ought to continue sending the request and discard the 100 response. If the request did not contain an Expect header field containing the 100-continue expectation, the client can simply discard this interim response.

When we usually talk about GET vs POST, we're actually talking about specification, not implementation. What is specification? To put it bluntly, it is the relevant RFC. Implementation is all code/libraries/products that implement the specifications described in the specification, such as curl, Python's requests library, or Chrome.
How to send a POST request is not what this RFC is discussing at all. The RFC only describes the connection between 100 continue and Expect header. For example, if you want to include a body in the GET request, you can send Expect: 100-continue and wait for 100 continue, which is in line with the standard.
That is to say, "XHR sends two TCP packets" is knowledge about implementation, not knowledge about specification. You can't say "Chrome will send two TCP packets during AJAX POST, GET will only send one" is the difference between GET and POST, just as you can't say that the national standards for industrial exhaust emissions are: question.

It seems to be more reasonable, and it has also moved out the high-end vocabulary such as RFC, specification, and implementation. Now I, the melon eater, can't sit still anymore, and decided to study it myself.

RFC Exploring

First of all, what is an RFC? The definition on the Wiki is:

The Request For Comments (RFC) are a series of memorandums issued by the Internet Engineering Task Force (IETF). The document collects information about the Internet, as well as software documents for the UNIX and Internet communities, arranged by number. Currently RFC documents are issued under the auspices of the Internet Society (ISOC).

Simply understand that RFC is the specification of the Internet. What we usually call "protocol" is in the form of RFC, and the RFCs of the current HTTP/1.1 specification are as follows:  RFC7230RFC7231RFC7232RFC7233RFC7234RFC7235 . Among them, Section 4. Request Methods in RFC7231 involves several HTTP methods, and then read this chapter carefully.

The request method token is the primary source of request semantics; it indicates the purpose for which the client has made this request and what is expected by the client as a successful result.

A very important word is involved here: semantic "semantics", so what is semantics? This article gives the explanation: the difference between syntax and semantics .

A language is a collection of legal sentences. What kind of sentence is legal? It can be judged from two aspects: syntax and semantics. Grammar is related to grammatical structure, whereas semantics is related to the meaning of word symbols combined according to this structure. Reasonable syntactic structure does not imply that semantics are legal. For example, we often say: I go to college, this sentence is in line with grammatical rules and semantic rules. But in college, although it conforms to grammatical rules, it has no meaning, so it is not in line with semantics.

For HTTP requests, the syntax refers to the format of the request response. For example, the first line of the request must be in  方法名 URI 协议/版本 this format. For details, please refer to the content in the previously written "Illustrated HTTP" reading notes. All requests that conform to this format are legal.

Semantics defines the nature of this type of request. For example, the semantics of GET is "getting resources", and the semantics of POST is "processing resources", so when implementing these two methods, you must consider their semantics and make behaviors that conform to their semantics.

Of course, it is also possible to implement behaviors that violate semantics under the premise of conforming to the grammar. For example, using the GET method to modify user information, and POST to obtain the resource list, it can only be said that the request is "legitimate", but not "semantic". of. When I write here, I suddenly think of two concepts in XML: Well Formed and Valid, which seem to be the concepts of syntax and semantics.

As mentioned above, the method is the main source of request semantics, that is, there are secondary sources. Some request headers can further modify the semantics of the request. For example, a  Range GET request with a header becomes a partial request.

Several characteristics of the HTTP method are defined in RFC7231:

  1. Safe - Safe
    The meaning of "safe" here is different from the commonly understood "safe". If the semantics of a method are "read-only" in nature, then the method is safe. If the client's request to the server's resources uses a safe method, it should not cause any state changes on the server, so it is also harmless. This RFC defines that the methods GET, HEAD, OPTIONS and TRACE are safe.
    However, this definition is only a specification, and does not guarantee that the implementation of the method is also safe. The implementation of the server may not conform to the semantics of the method, as mentioned above in the case of using GET to modify user information.
    The purpose of introducing the concept of security is to facilitate web crawlers and caches, so as to avoid some unintended consequences when calling or caching some unsafe methods. The User Agent (browser) should differentiate between safe and unsafe methods and prompt the user.
  2. Idempotent -
    The concept of idempotent idempotent means that the same request method is executed multiple times and the effect is exactly the same as if executed only once. According to the RFC specification, PUT, DELETE and safe methods are all idempotent. Again, this is just a specification, and there is no guarantee that the server implementation is idempotent.
    The introduction of idempotency is mainly to deal with the situation where the same request is sent repeatedly, such as losing the connection before the request responds. If the method is idempotent, you can safely resend the request. This is also why the browser will prompt the user when it encounters a POST when backing/refreshing: POST semantics are not idempotent, and repeated requests may have unintended consequences.
  3. Cacheable - Cacheability, as the name suggests, is whether a method can be cached. In this RFC, GET, HEAD and in some cases POST are all cacheable, but most browser implementations only support GET and HEAD. For more information on caching see RFC7234.

The same thing has been emphasized in these three features, that is, the protocol is not equal to the implementation: the protocol stipulates that the security is not necessarily safe in the implementation, the protocol stipulates that the idempotent is not necessarily idempotent in the implementation, and the protocol stipulates that the cache can be cached in the implementation. Must be cacheable. This is actually the relationship between specification and implementation mentioned by the author above.

semantic battle

At this point, I actually understand the difference between these two methods, which is essentially a comparison of "semantics" rather than "grammar", and a comparison of "Specification" rather than "Implementation".

Regarding the semantics of these two methods, the original text in RFC7231 has been written very well:

The GET method requests transfer of a current selected representation for the target resource. GET is the primary mechanism of information retrieval and the focus of almost all performance optimizations. Hence, when people speak of retrieving some identifiable information via HTTP, they are generally referring to making a GET request.
A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request.

The POST method requests that the target resource process the representation enclosed in the request according to the resource's own specific semantics.

Reluctantly flip through the scum, and add some of my own understanding:

The semantics of GET is to request the specified resource. The GET method is safe, idempotent, and cacheable (unless  Cache-Control constrained by Header), and the body of the GET method has no semantics.

The semantics of POST is to process the specified resource according to the request load (message body), and the specific processing method varies depending on the resource type. POST is not safe, not idempotent, and (most implementations) not cacheable. In order to target its uncacheability, there are a series of methods to optimize, and there is a chance to study it later (FLAG has been established).

Let’s take a common chestnut. In the scenario of Weibo, the semantics of GET will be used in scenarios such as “Look at the latest 20 Weibos on my Timeline”, while the semantics of POST will be used in “Posting Weibo” blogs, comments, likes”.

Summarize

This article starts from the usual understanding, goes through one questioning and another anti-questioning, twists and turns, and finally deepens the understanding of the HTTP method by reading the RFC specification. It is really a pleasant journey of exploration~ My biggest feeling is: Don't copy what others say, We must insist on independent thinking, do not be satisfied with second-hand knowledge, and strive to trace the source.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325012988&siteId=291194637