GET and POST difference between (1)

Although this problem seems very elementary, but in fact it involves all aspects, which is why the old love to ask in the interview that one of the reasons.

The first to be used for HTTP and HTML forms of interaction between the browser and server communication protocol; later been expanded to a broad definition of the interface format. So when discussing the difference between GET and POST, you need to determine the next in the end is now GET / POST used by the browser or use HTTP as the transport protocol interfaces scenes.

Browser GET and POST

Here especially the browser Fei Ajax HTTP requests, that the birth of the HTTP protocol has been used in the GET / POST from HTML and browser. Browser with a GET request to get a page html / images / css / js and other resources; with POST to submit a <form> form, and get a page of results.

The browser GET and POST is defined as:

GET

"Read" a resource. For example, Get into a html file. Should not be read repeatedly to access the data have side effects. For example, "GET it, the user orders the return order has been accepted," This is unacceptable. No side effects known as "idempotent" (Idempotent).

Because GET because it is read, the data cache can do the GET request. The caching can be done on the browser itself (completely avoid requesting browser), proxy can do (e.g. Nginx), or do end server (with an Etag, at least reduce bandwidth consumption)

POST

In the pages <form> tag defines a form. Click on the submit element will issue a POST request to the server to do one thing. It is often a side effect, not idempotent.

No power and other means can not be arbitrarily executed multiple times. Therefore it can not be cached. For example, a single by next POST, the server creates a new order, and then return to the interface order to succeed. This page can not be cached. Imagine if POST requests are cached browser, then the next single request to the server can not send a request directly returned to the local cache "under a single interface successfully", but did not really alone in the server. It is a matter of how ridiculous things.

Because POST may have side effects, so the browser can not achieve saved as a bookmark for the POST request. Think about it, if you tap the bookmark the next single, is not it terrible? .

Also if you try to re-execute the POST request, the browser will prompt the box playing a refresh can cause side effects, asking them not to continue.

Try to re-submit the form will be playing in the chrome frame.

Of course, server developers can put GET implemented as side effects; POST implemented without the side effects. But this does not match the expected browser. There are side effects to achieve the GET is a very terrible thing . I vaguely remember security vulnerabilities long before Baidu Post Bar has an administrator's permission because you can modify using the GET request caused. In turn, without the side effects of the POST request with the realization of the browser frame the bomb will bounce box, users experience the benefits of improved little.

It can be seen behind, when used in the form HTTP POST interface, there is no such block the bomb. So to achieve a POST request to have practical significance idempotent. POST idempotent make a lot of business interaction smoother front and rear end, as well as avoid duplicate submission to bring the front end bug, touch blunders. There will be a side effect of the operation is implemented as idempotent have to be able to define the business from how even a "repeat." The data submitted in a dedupKey increase effective in a trading session, or with the data submitted in the field may naturally when dedupKey. So if a user forced to resubmit, you can do a server-side protection.

GET and POST carry data format is also different. When the browser sends a GET request, it means that either the users themselves in the address bar of your browser, click a href html or is there a tag in the url. So in fact, not only with GET url, GET browser directly but issued only triggered by a url. So no way to bring some of the GET parameters outside the url can only rely on incidental querystring url. But the HTTP protocol itself does not have this limitation.

POST request comes from the browser form submission. Each submission, the form data is encoded into the browser with body HTTP request's. POST request sent by the browser body there are two formats, one is application / x-www-form-urlencoded used to simply transmit data, probably "key1 = value1 & key2 = value2" this format. Another is a file transfer, will use multipart / form-data format. Because the latter is application / x-www-form-urlencoded encoding binary data for this file is very inefficient.

When a browser POST form, also can take the url parameters, on the line as long as the <form action = "url"> where url with querystring. Only those data <input> tag and the like generated through a user operation which will form in the body are inside.

So we usually general, said "GET request no body, only url, request data on the querystring url's; POST request data in the body." But this scene is limited to the requesting browser.

Interface GET and POST

Refers herein by http client App browser Ajax api, or iOS / Android of, java of commons-httpclient / okhttp or curl, postman tools like issued to the GET and POST requests. At this point GET / POST can be used not only in the interaction of the front and rear, but also with (ie, where one RPC protocol) in each sub-call back-end service. Although there are many RPC protocol, such as thrift, grpc, but http itself has a large number of ready-made support tools can be used, and very friendly people, easy to debug. Using the HTTP protocol in the micro-service is quite common.

When the interface to send HTTP requests implemented, there is no browser, so many restrictions, as long as it is in line with the HTTP format can be sent. HTTP request format, something like a string (for aesthetics, I \ r \ n After wrapping it all):

<METHOD> <URL> HTTP/1.1\r\n
<Header1>: <HeaderValue1>\r\n
<Header2>: <HeaderValue2>\r\n
...
<HeaderN>: <HeaderValueN>\r\n
\r\n
<Body Data....>

Where "<METHOD>" may be a GET is POST, or other HTTP Method, as PUT, DELETE, OPTION ....... Look from the agreement itself, and there is no limit GET must not say no body, POST will certainly not be put into parameter <URL> on the querystring. So in fact it is more free to use format. For example Elastic Search of _search api to use the GET with the body; can also develop their own interface to allow half of the POST parameters on the querystring url's, the other half put the body in; you can even get all the parameters are put in Header - you can do all kinds of customization, as long as the request of the client and the server can be a good agreement.

Of course, too much freedom also brought another trouble, developers had to determine the parameters of every discussion is put in the path url, querystring inside, body in, header in this problem, too inefficient a. So there will be a series interface specification / style. Which was undoubtedly the best known REST. REST full use of GET, POST, PUT and DELETE, agreed four interfaces are access, create, delete and replace "resource", REST best practices also recommended json format in the request body. So just by looking at the HTTP method interfaces you can understand what it means, and parsing format has been unified.

json advantages relative to x-www-form-urlencoded in that 1) can be nested structure; and 2) can support a richer data type. By frameworks, json server code can be directly mapped to business entities. With very convenient. But if it is a written interface supports uploading files, or multipart / form-data format is more appropriate.

REST GET and POST in not just used. In REST, [GET] + [] is dedicated resource locator to obtain a list of resources, or resources, such as:

GET http://foo.com/books          获取书籍列表
GET http://foo.com/books/:bookId  根据bookId获取一本具体的书

Similar scenes with the browser, REST GET should not have side effects, so the brain can not be invoked repeatedly. Browser (including browser Ajax requests) can also be achieved for such GET cache (if the server suggests a clear need Caching); but if you use a non-browser, there is no cache can not see the client's true. Of course, from the perspective of the entire App, it can completely bypass the browser's caching mechanism to achieve a business custom caching framework.

Cache control classes okhttp

[POST] + [REST resource locator] is used to "create a resource", such as:

POST http://foo.com/books
{
  "title": "大宽宽的碎碎念",
  "author": "大宽宽",
  ...
}

Here you will be able to notice that the browser used to implement POST form submission, and REST resources in implementation creates POST difference in semantics.

Incidentally, speaking at the difference between REST POST and REST PUT of. Some api is to use the PUT Method created as a resource. The difference is that PUT and POST, PUT actual semantics is "replace" replace. REST specification mentioned PUT request body should be complete resources, including id included. For example, the above api to create a book can also be defined as:
PUT http://foo.com/books
{
  "id": "BOOK:affe001bbe0556a",
  "title": "大宽宽的碎碎念",
  "author": "大宽宽",
  ...
}
The server should first lookup based on the id request, if there is an element corresponding id, and to use data request overall replacement resource already exists; if not, to use "to the id corresponding to the resource from [empty] Replace [request data]. " Intuitive seems, it is "to create" a.

Compared with the PUT, POST is more like a "factory", to create a complete resource through a set of necessary data.
As in the end to create a resource with PUT or POST, it all depends on resources is not know in advance all of the data (especially id), and is not a complete replacement. For such objects such as storage services AWS S3, when you want to upload a new resource whose id is "ObjectName" can know ahead of time; at the same time this api always replace complete the entire resource. At this time the api semantic PUT more appropriate; and for those server id is automatically generated scene, POST more appropriate.

A little beside the point, stop there.

AWS S3 API to create a description of Object

Back in the interface theme, the above is only a rough description of the circumstances of REST. But in reality there is always a variant of REST, REST also possible to use non-protocol (such as JSON-RPC, SOAP, etc.), in each case in the GET and POST would be different.

About Security

We often hear GET POST security is better, because POST data transmission with the body, while GET url with transport, easier to see. But from the point of attack, whether it is a GET or POST secure enough, because HTTP itself is an express agreement . Each HTTP request and returns each byte plaintext spread on the network, whether it is url, header or body . This is not a "is easy to see in the browser address bar" issue.

In order to avoid transmission of data theft, it must be done end to end encryption from client to server. Common practice in the industry is HTTPS - ie http data the negotiated agreement with SSL encryption key plaintext. This encryption protocol and HTTP protocol itself independent. If the development is the use of HTTP public network of sites / App, to ensure security, https is the most basic requirement.

Of course, end to end encryption does not necessarily have to use https. For example, the domestic financial sector will use a private network, there are encryption protocol GB of SM series. But in addition to special military institutions, finance, it does not seem necessary to have invented a similar ssl protocol.

HTTP back to itself, does a GET request parameters are more inclined on the url, so there is more chance of being the leak. Such as carrying private information will show the url in the address bar, you can also share to a third party, it is very unsafe. Further, from the client to the server, a large number of intermediate nodes, including the gateway, and other agents. Their access log usually outputs the full url, such as the default access nginx's log is the case. If the sensitive data on the carrying url, will be recorded. Note, however , even in the privacy of the data in the body, can also be recorded , so if you do not trust the request to go through the public network, to avoid leaks of the only means that HTTPS . That's "access log to avoid leakage" refers only to the trusted zone to avoid the default behavior of the http proxy of the potential safety problems. For example, you are unlikely to want to make their company's operation and maintenance students to see the user's password from the log of the company's main gateway in it.

Further, the above mentioned, if it is used as an interface, with the GET may actually body, POST data may be carried on the url. So in fact, in the end how transmission of private data, detailed analysis depends on the specific scene. Of course, the vast majority of scenes, with POST + body to write private data is a reasonable choice. A typical example is the "Login":

POST http://foo.com/user/login
{
  "username": "dakuankuan",
  "passowrd": "12345678"
}

Safety is a huge topic, there is a complete system consisting of a lot of details, such as the return mask confidential data, XSS, CSRF, cross-domain security, front-end encryption, fishing, salt, ...... on safety in this matter POST and GET only a small role. So discuss POST and GET itself which is more secure too much sense not alone. Just remember In general, the private data transmission POST + body like.

About coding

There are common argument, such as GET parameters can only support ASCII, and POST can support arbitrary binary, including Chinese. But can actually be seen from the above, GET and POST can actually use url and body. So-called coding rather should be the url http what encoding, body what encoding.

Under the first url. url can only support ASCII saying dates back to RFC1738

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

In fact this is only a predetermined subset of ASCII [a-zA-Z0-9 $ -_. +! * '(),]. They can be "not coded" in the url. For example, although the space is ASCII characters, but can not be directly used in the url.

That this "coding" What is it? If you have special symbols and Chinese how to do it? One kind is called percent encoding encoding method is used to do this:

https://en.wikipedia.org/wiki/Percent-encoding​en.wikipedia.org

 

That's why we occasionally see in the url sequence cook% and 16 digits.

Use Percent Encoding, even binary data, but also can be encoded on the URL of.

But pay special attention to this encoding simply convert characters into characters URL is available, but regardless of character set encoding (such as Chinese in the end is UTF8 or GBK) early this has been quite chaotic, there is no unified standard. For example, sometimes with the same web page coding, as some of the operating system code. The most terrible is that the address bar of your browser is not the developer's control. Thus, for the same url with a Chinese, if any browser must use GBK (such as older IE8), and some must use UTF8 (such as chrome). Back end could not recognize. This common approach is to avoid the user to enter this url with the Chinese. If this form of request, are input into the user interface, and then sent by way Ajax. Ajax developer encoded form is sent 100% of control.

But now basically it has unified the utf8. Now developers unless it is required by state regulations must use GB series coding scene, basically will not encounter such a problem.

About the url encoding, Ruan Yifeng article a more detailed explanation:

About URL encoding - Ruan Yifeng web log www.ruanyifeng.comicon

Incidentally, although you can see the Chinese in the browser address bar. But this url in the process of sending the request, the browser will use Chinese character encoding + Percent Encode translated into a real url, then sent to the server. The browser address bar of the Chinese just want it better user experience.

And then on to the next Body. HTTP Body relatively better, because there is a Content-Type to a more precise definition. such as:

POST xxxxxx HTTP/1.1
...
Content-Type: application/x-www-form-urlencoded ; charset=UTF-8

Content-Type herein will also define the format of the request body (application / x-www-form-urlencoded) and character encoding (UTF-8).

So the body can submit url and Chinese data to the backend, but a good number of POST norms, relatively error-prone, easy for developers to feel at ease. In the case of GET + url, they do not involve url in the address bar to enter the old browser, it would not be much of a problem.

Back to the POST, POST requests from the browser directly form is submitted, the form is submitted and only application / x-www-form-urlencoded for a simple key-value scenes; and multipart / form-data, only the file for submission, or both there are mixed file and submit the form of key-value scenes.

If Ajax or other HTTP Client POST request to issue, the body format is very free, and commonly used json, xml, text, csv ...... even format your own invention. As long as the front and rear end of the can to a good agreement.

Browser POST need to send two requests?

The above "HTTP format" clearly shows the HTTP request can be roughly divided into two parts "request header" and "body of the request." When using HTTP everyone will have a convention that all "control class" should be placed in the request header information, request specific data on the body inside. "So when parsing the server will always be the first fully resolve all requests head. Thus, the server always want to understand the request control information, can determine how further processing of the request is rejected, or to calling the corresponding data parser in accordance with content-type, or directly forwarded with zero copy .

For example, when written in Java service request processing code is always from HttpSerlvetRequest in getParameter / Header / url. This information is requested in advance, the frame directly resolved. As for the request body, it provides only a inputstream, if developers think it should be further processed on their own to read and parse the request body. This server will be able to reflect the different handling of the request header and request body.

For practical examples, such as writing a file upload service, the request url contains the file name, the request body is a size of several megabytes of compressed binary stream. Server. After receiving the request, you can get the first request of the head, not have permission to view the user is uploaded, the file name is not in line with norms. If not, it is no longer processing the data request body, discarded. And do not wait until the whole process all over again rejected the request.

To further optimize the client can use the HTTP protocol to do so Continued: The client is always the first to send all request headers to the server, the server to check. If it passes, the server replies "100 - Continue", then the rest of the client to the server data. If the request is rejected, the server will return a 400 error and the like, this interaction was terminated. In this way, you can avoid wasting bandwidth transfer request body. But at the cost will be more than a Round Trip. If the data request body just is not much, then all at once to the server may actually be better.

Based on this, the client will be able to do some optimization, such as inside a data set of more than 1KB on POST starting only "request header", otherwise it is a one-time full-fat. The client can even do some Adaptive strategies, statistical transmission success rate, if the success rate is high, it is always all the hair and so on. Different browsers, different clients (curl, postman) can have their own different scenarios. Anyway do, the purpose of optimizing always improve data throughput and reduce the waste of bandwidth to make a compromise.

So in the end is a hair or hair N times, the client can be very flexible decision. Because no matter how fat are in line with the HTTP protocol, so we should be treated as such optimization is an implementation detail, instead of being represented on the difference between GET and POST itself. When not to find what Century.

In the end what is considered the request body

After reading the contents of the above, the reader might be "What is the body of the request" was puzzled, such as x-www-form-endocded coded body count "request body" mean?

From the perspective of the HTTP protocol, "request header" is Method + URL (including querystring) + Headers; request body are further behind.

But from a business perspective, if you put a request to immediately call a word. For example, the above

POST http://foo.com/books
{
  "title": "大宽宽的碎碎念",
  "author": "大宽宽",
  ...
}

Written in Java is probably equivalent to

createBook("大宽宽的碎碎念", "大宽宽");

Function name and then the row two parameters can be regarded as a request, does not distinguish between the head and the body . Even with the HTTP protocol, title and author encoding to the HTTP request body. Support for the Java HttpServletRequest data x-www-url-form- encoded with the method getParameter, meaning of the expression is the "request" and "parameters."

For HTTP, need to distinguish between [head] and [body] , Http Request and Http Response are so distinguished. Http do it mainly used for :

  • For HTTP Proxy
    • Support forwarding rules, such as nginx first to parse request headers, URL and Header got to decide how to do (forwarding proxy_pass, redirect redirect, re-determine the rewrite ......)
    • Need the information recording log request header. Although the body of the data in the request may be recorded, but generally only a portion of the data recording request header.
    • If the proxy rules not related to the request body, then the body can not request from the page cache copy of a kernel mode to user mode, you can zero copy forwarded directly. This is extremely effective for the scene to upload files.
    • ……
  • For the HTTP server
    • ACL can be controlled by the request header, such Athorization ahead to see if authentication data allows
    • Can do some blocking, such as Content-Length seen in the number is too large, or they do not support the Content-Type, or Accept the required format they can not deal with it directly back failed.
    • If a large body of data using Stream API, it can easily support a piece of process data, rather than all at once and then read out operations that take up a lot of memory.
    • ……

But from a business perspective high level, we really care about is [request] and [return]. When we say "request header" when these three words, might actually mean [request]. [] Achieved while using HTTP request, may only use the HTTP request header [] (such as most of the GET request), it may be an HTTP request header [] + [] HTTP request body (such as to achieve a single-use POST) .

In short, there are two layers do not mix, oh.

About the length of the URL

As we mentioned above, both GET and POST URL can be used to transfer data, so we often say "GET data has limits" actually refers to the "URL length limit."

HTTP protocol itself URL length did not make any provision. The actual limit is determined by the client / server and browser.

Let me talk about the browser. Different browsers are not the same. For example, we often say that the limit of 2048 characters, in fact, IE8 restrictions. And the original document actually says is "the maximum length of the URL is 2083 characters, path is the longest part of 2048 characters." See https://support.microsoft.com/en-us/help/208427/maximum-url-length-is-2-083-characters-in-internet-explorer . IE URL after IE8 limit I have not found a clear document, but some information that the address bar of IE 11 only input method 2047 characters, but allow users to click html in the long URL. I did not test, who are interested can try.

Chrome's URL limit is 2MB, see https://chromium.googlesource.com/chromium/src/+/master/docs/security/url_display_guidelines/url_display_guidelines.md

Safari, Firefox and other browsers also have their own restrictions, but much larger than IE, not one by one listed here.

However, the new IE have started using Chrome's kernel, which means "the browser URL length is limited to 2048 characters." This statement will gradually become history.

Other clients, such as Java's, js majority of http client also does not limit how long the maximum URL.

In addition to the browser, server side there are limitations, such as the apache LimieRequestLine instructions.

apache effectively limits the first line of the HTTP request "Request Line" length, i.e. <METHOD> <URL> <VERSION> the line.

Another example nginx with large_client_header_buffersinstruction buffer request to allocate a long header of the data. This buffer can be used to handle url, header value and the like.

Tomcat limit is set to the web.xml maxHttpHeaderSize control of the entire "request header" total length.

Why would you limit it? If you wrote a string parsing code will be able to understand, resolve time to allocate memory. For parsing a byte stream, the buffer must be allocated to hold all the data to be stored. But it must allocate a large enough block of memory when URL such thing as a whole must be viewed, not piece by piece process, so they handle a request. If the URL is too long, complicated by another high, it is easy tumbled server's memory; at the same time, the benefits of long URL is not much, I only deal with the old system at the time because the URL dare not touch the original logic, we have added For more data, will use long URL.

For developers, the use of long URL entirely to their gravesites, before and after taking into account the need to end, and an intermediate proxy every aspect of configuration. In addition, long URL affect search engine crawlers, even some reptiles can not handle more than 2000 bytes of the URL. This means that the URL can not be seized, pit father ah.

In fact, not much is necessary to clarify the precise maximum URL length limit. My personal experience is that, URL length as long as a resource to be developed / api is likely to reach more than 2000 bytes, you must use the body to transmit data, unless there are special circumstances . As in the end is a GET + body or POST + body may decide to see the situation.

Note that a kanji character after UTF8 encoding + the percent encoding will become 9 bytes, do not miscalculate Oh.

to sum up

The above we talked about a lot in the hope that readers will not rote difference between GET and POST, but from a broader level to look at and think about this question.

Finally, agreements are man-made . As long as the client and server can identify with each other, we will be able to work. Under normal circumstances, a compliant way to implement the system can reduce a lot of work - we all agreed Well, do not toss. However, there will always be some cases by conventional norms is inappropriate, does not meet the demand. At this idea can not be dead specification limits, not to pull the dead RFC. These specifications may not be able to address the special problems you encounter. such as:

  • Elastic Search of _search interface GET, but with the body to express a query, because the query is complex, with querystring very troublesome, it must be comfortable with json format, easier to use json encoded in the request body, do not toss percent encoding.
  • Idempotency may also consider when writing a single interface to use POST, because the front end is possible to achieve "single-button" has bug, causing the user one-click issue N requests. You can not say that because POST by design should not be idempotent on the matter.

The agreement is dead, people are living. Flexible use hand tools to meet the needs of practical problems encountered enough.

Published 109 original articles · won praise 101 · views 360 000 +

Guess you like

Origin blog.csdn.net/Alen_xiaoxin/article/details/105160175