HTTP request/response header structure

HTTP request

An HTTP request consists of four parts: request line, request header, blank line, and request data. -Request line The
request line consists of three fields: the request method field, the URL field and the HTTP protocol version field, which are separated by spaces. For example GET /data/info.html HTTP/1.1

The method field is the request method used by HTTP, such as the common GET/POST

There are two HTTP protocol versions: HTTP1.0/HTTP1.1 can be distinguished as follows:

HTTP1.0 can only transmit one request and response for each connection, and the request will be closed. HTTP1.0 has no Host field; while HTTP1.1 can transmit multiple requests and responses in the same connection, and multiple requests can overlap. And at the same time, HTTP1.1 must have the Host field.

- The request header
HTTP client program (such as a browser) must specify the request type (usually GET or POST) when sending a request to the server. The client may also choose to send other request headers if necessary. Most request headers are not required, with the exception of Content-Length. Content-Length must be present for POST requests.

Common request header field meanings:

Accept: The MIME types accepted by the browser.

Accept-Charset: The character set accepted by the browser.

Accept-Encoding: The data encoding method that the browser can decode, such as gzip. Servlets can return gzip-encoded HTML pages to browsers that support gzip. In many cases this can reduce download times by a factor of 5 to 10.

Accept-Language: The type of language the browser expects, used when the server can provide more than one language version.

Authorization: Authorization information, usually present in the response to the WWW-Authenticate header sent by the server.

Content-Length: Indicates the length of the request message body.

Host: The client uses this header to tell the server the hostname it wants to access. The Host header field specifies the Internet host and port number of the requested resource, and must indicate the location of the origin server or gateway of the request url. HTTP/1.1 requests must include the host header field, otherwise the system will return a 400 status code.

If-Modified-Since: The client tells the server through this header the cache time of the resource. It is returned only if the requested content has been modified after the specified time, otherwise a 304 "Not Modified" response is returned.

Referer: The client uses this header to tell the server from which resource it is accessing the server (the anti-leech). Contains a URL from which the user accesses the currently requested page from the page represented by the URL.

User-Agent: The content of the User-Agent header field contains the requesting user information. Browser type, useful if the content returned by the servlet is browser type dependent.

Cookie: The client can send data to the server through this header, which is one of the most important request header information.

Pragma: Specifying the "no-cache" value indicates that the server must return a refreshed document, even if it is a proxy server and already has a local copy of the page.

From: The email address of the sender of the request, used by some special Web client programs, and not used by browsers.

Connection: Whether to disconnect or keep the connection after processing this request. If the servlet sees the value here as "Keep-Alive", or sees that the request is using HTTP 1.1 (HTTP 1.1 defaults to persistent connections), it can take advantage of persistent connections when the page contains multiple elements (e.g. Applet, picture), significantly reducing the time required for downloading. To achieve this, the servlet needs to send a Content-Length header in the response. The easiest way to do this is to first write the content to a ByteArrayOutputStream, and then calculate its size before actually writing the content out.

Range: The Range header field can request one or more subranges of the entity. E.g,

Indicates the first 500 bytes: bytes=0-499

Indicates the second 500 bytes: bytes=500-999

Indicates the last 500 bytes: bytes=-500

Indicates the range after 500 bytes: bytes=500-

First and last bytes: bytes=0-0,-1

Specify several ranges at the same time: bytes=500-600,601-999

But the server MAY ignore this header, if the unconditional GET contains the Range header, the response will be returned with status code 206 (PartialContent) instead of 200 (OK).

UA-Pixels, UA-Color, UA-OS, UA-CPU: Non-standard request headers sent by some versions of Internet Explorer to indicate screen size, color depth, operating system, and CPU type.

-Empty line
Its role is to tell the server where to stop the request header through an empty line.
- Request data
If the method field is GET, this item is empty and there is no data

If the method field is POST, usually the data to be submitted is placed here

For example, to submit a form using the POST method, the data in the user field is "admin" and the password field is 123456, then the request data here is user=admin&password=123456, and use & to connect each field.

In general, the format of the HTTP request message is as shown in the following figure

write picture description here

The above is the POST method. Generally, there are no parameters in the URL segment of the request line, and the parameters are placed in the message body. The parameters of the GET method are directly placed in the request line URL, and the message body is empty.

response message

The response message consists of three parts: response line, response header, blank line, response body
- response line The
response line is generally composed of protocol version, status code and its description, such as HTTP/1.1 200 OK

The protocol version HTTP/1.1 or HTTP/1.0, 200 is its status code, and OK is its description.

//Common status codes:

100~199: Indicates that the request is successfully received, and the client is required to continue to submit the next request to complete the entire processing process.

200~299: Indicates that the request has been successfully received and the entire processing process has been completed. Commonly used 200

300~399: In order to complete the request, the customer needs to further refine the request. For example: the requested resource has moved to a new address, commonly used 302 (meaning you request me, I will let you find someone else), 307 and 304 (I will not give you this resource, take the cache yourself)

400~499: There is an error in the client's request, commonly used 404 (meaning that the resource you requested is not available in the web server) 403 (the server refuses access, the permission is not enough)

500~599: An error occurred on the server side, usually 500

More detailed status code information
- response header The
response header is used to describe the basic information of the server and the description of the data. The server can notify the client how to process the data it sends back after a while through the description information of the data.

Setting HTTP response headers is often combined with status codes. For example, several status codes that indicate "the document location has changed" are accompanied by a Location header, while 401 (Unauthorized) status codes must be accompanied by a WWW-Authenticate header. However, it is useful to specify a response header even when no status code with special meaning is set. Response headers can be used to accomplish: setting cookies, specifying modification dates, instructing the browser to refresh the page at specified intervals, declaring the length of the document to utilize persistent HTTP connections, ... and many other tasks.

Common response header field meanings:

Allow: Which request methods (such as GET, POST, etc.) are supported by the server.

Content-Encoding: The encoding (Encode) method of the document. The content type specified by the Content-Type header can only be obtained after decoding. Using gzip to compress documents can significantly reduce the download time of HTML documents. Java's GZIPOutputStream can easily do gzip compression, but it is only supported by Netscape on Unix and IE4, IE5 on Windows. Therefore, the servlet should check whether the browser supports gzip by looking at the Accept-Encoding header (ie request.getHeader("Accept-Encoding")), and return the gzip-compressed HTML page for browsers that support gzip, and return normal for other browsers page.

Content-Length: Indicates the content length. This data is only required if the browser uses persistent HTTP connections. If you want to take advantage of persistent connections, you can write the output document to a ByteArrayOutputStream, check its size when you're done, put that value in the Content-Length header, and finally send the content via byteArrayStream.writeTo(response.getOutputStream() .

Content-Type: Indicates what MIME type the following document belongs to. Servlet defaults to text/plain, but usually needs to be explicitly specified as text/html. Since Content-Type is often set, HttpServletResponse provides a dedicated method setContentType.

Date: The current GMT time, for example, Date:Mon,31Dec200104:25:57GMT. The time described by Date represents the world standard time. To convert to local time, you need to know the time zone where the user is located. You can set this header with setDateHeader to avoid the trouble of converting the time format.

Expires: Tells the browser how long to cache the returned resource, -1 or 0 is not cached.

Last-Modified: The last modification time of the document. The client can provide a date through the If-Modified-Since request header, the request will be treated as a conditional GET, and only documents whose modification time is later than the specified time will be returned, otherwise a 304 (Not Modified) status will be returned. Last-Modified can also be set using the setDateHeader method.

Location: This header is used with the 302 status code to redirect the recipient to a new URI address. Indicates where the client should go to retrieve the document. Location is usually not set directly, but through the sendRedirect method of HttpServletResponse, which also sets the status code to 302.

Refresh: Tell the browser how often to refresh, in seconds.

Server: The server tells the browser the type of server through this header. The Server response header contains software information about the origin server that handled the request. This field can contain multiple product IDs and comments, and product IDs are generally sorted by importance. Servlet generally does not set this value, but set by the Web server itself.

Set-Cookie: Set a cookie associated with the page. Servlets should not use response.setHeader("Set-Cookie", ...), but should use the dedicated method addCookie provided by HttpServletResponse.

Transfer-Encoding: Tells the browser the transfer format of the data.

WWW-Authenticate: What type of authorization information should the client provide in the Authorization header? This header is required in responses that contain a 401 (Unauthorized) status line. For example, response.setHeader("WWW-Authenticate", "BASIC realm=\"executives\""). Note that servlets generally do not handle this, but instead allow the Web server's specialized mechanisms to control access to password-protected pages.

Note: The most common method for setting the response header is setHeader of HttpServletResponse, which has two parameters, which represent the name and value of the response header respectively. Similar to setting status codes, setting response headers should be done before sending any document content.

The setDateHeader method and the setIntHeader method are specially used to set the response header containing date and integer value. The former avoids the trouble of converting Java time to GMT time string, and the latter avoids the trouble of converting integer to string.

HttpServletResponse also provides many settings

setContentType: Set the Content-Type header. Most servlets use this method.

setContentLength: Set the Content-Length header. This function is useful for browsers that support persistent HTTP connections.

addCookie: Sets a cookie (there is no setCookie method in the Servlet API, because the response often contains multiple Set-Cookie headers). -empty line
- response body The response body is the message body of the response. If it is pure data, it returns pure data. If the request is an HTML page, then the returned HTML code, if it is JS, it is JS code, and so on.

The format of the HTTP response message is shown in the following figure
write picture description here

HTTP header fields

The HTTP header field includes four parts: general header, request header, response header and entity header. The request headers and response headers were explained earlier, and then let’s look at the general headers and entity headers (this may be repeated with the request headers and response headers introduced earlier). -Common header The
general header field contains the header fields supported by both request and response messages. The general header field includes Cache-Control, Connection, Date, Pragma, Transfer-Encoding, Upgrade, and Via. The extension to the general header field requires both parties to support this extension. If there is an unsupported general header field, it will generally be treated as an entity header field. A few common header fields are briefly introduced below.

Common common header meanings:

Cache-Control: Specifies the caching mechanism that requests and responses follow. Setting Cache-Control in a request message or a response message does not modify the caching process in another message process. The cache instructions in the request include no-cache, no-store, max-age, max-stale, min-fresh, only-if-cached, and the instructions in the response message include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age. The meaning of the instructions in each message is as follows:

Public indicates that the response can be cached by any buffer. Private indicates that the whole or part of the response message for a single user cannot be processed by the shared cache. This allows the server to describe only part of the user's response message, which is not valid for other users' requests. no-cache indicates that request or response messages cannot be cached. The no-store is used to prevent important information from being published unintentionally. Sending in the request message will make the request and response messages not use the cache. max-age indicates that the client can receive responses with a lifetime no longer than the specified time (in seconds). min-fresh indicates that the client can receive responses with a response time less than the current time plus the specified time. max-stale indicates that the client can receive response messages beyond the timeout period. If you specify a value for max-stale messages, the client can receive response messages that exceed the value specified in the timeout period.

Date: Indicates the time when the message was sent. The description format of the time is defined by rfc822. For example, Date:Mon,31Dec200104:25:57GMT. The time described by Date represents the world standard time. To convert to local time, you need to know the time zone where the user is located.

Pragma: Used to contain implementation-specific instructions, the most commonly used is Pragma:no-cache. In the HTTP/1.1 protocol, its meaning is the same as Cache-Control:no-cache.
- Entity header
Both the request message and the response message can contain entity information. The entity information generally consists of the entity header field and the entity. The entity header field contains the original information about the entity. The entity header includes Allow, Content-Base, Content-Encoding, Content-Language, Content-Length, Content-Location, Content-MD5, Content-Range, Content-Type, Etag, Expires , Last-Modified, extension-header. extension-header allows clients to define new entity headers, but these fields may not be recognized by unreceived parties. An entity can be an encoded byte stream whose encoding is defined by Content-Encoding or Content-Type and whose length is defined by Content-Length or Content-Range.

Common entity header meanings:

Content-Encoding: The server tells the browser the compression format of the data through this header.

Content-Length: The server tells the browser the length of the returned data through this header.

Content-Disposition: Tells the browser to open the data for download.

Content-Type: The server tells the browser the type of data sent back through this header. The Content-Type entity header is used to indicate the media type of the entity to the recipient, specify the entity media type sent to the recipient by the HEAD method, or the requested media type sent by the GET method.

Content-Range: Used to specify the insertion position of a part of the entire entity, and it also indicates the length of the entire entity. After the server returns a partial response to the client, it MUST describe the range covered by the response and the entire entity length. General format:

Content-Range:bytes-unitSPfirst-byte-pos-last-byte-pos/entity-legth

For example, to transmit a header of the form 500-byte subfield: Content-Range:bytes0-499/1234, if an http message contains this stanza (for example, a response to a range request or an overlapping request to a series of ranges), Content-Range:bytes0-499/1234 Range indicates the transmission range, and Content-Length indicates the actual number of bytes transmitted.

Last-Modified: Specifies the last revision time of the content saved on the server.

ETag: cache related headers

Expires: Tell the browser how long to cache the returned resource - 1 or 0 is not cached

Three of these header fields prevent browser caching:

Expires: -1 or 0

Cache-Control：no-cache

Pragma：no-cache

POST/GET difference

Http defines different methods for interacting with the server. There are four basic methods: GET, POST, PUT, and DELETE.

In HTTP, GET, POST, PUT, and DELETE correspond to the four operations of querying, modifying, adding, and deleting URL resources. So: GET is generally used to obtain/query resource information, while POST is generally used to update resource information.

Mainly distinguish between get and post
- the data of the GET request in the form of submitted data
will be attached to the URL (that is, the data will be placed in the HTTP protocol header), and it will be displayed directly in the address bar. The URL and transmission data are separated by ?, and between the parameters Connected with &, for example: login.action?name=hyddd&password=idontknow&verify=%E4%BD%A0%E5 %A5%BD.

If the data is English letters/numbers, send it as it is, if it is a space, convert it to +, if it is Chinese/other characters, directly encrypt the string with BASE64,

The result is as follows: %E4 %BD%A0%E5%A5%BD, where XX in %XX is the ASCII of the symbol in hexadecimal.

The POST method will put the data in the request data field and separate each field with &, the request line does not contain data parameters, and the address bar will not have additional parameters
- submit data size
get method The size of the submitted data directly affects the length of the URL , but the HTTP protocol specification does not actually limit the length of the URL. The limitation on the length of the URL is affected by the difference in the support of the client or server: for example, the limit of IE on the length of the URL is 2083 bytes (2K+35). For other browsers, such as Netscape, FireFox, etc., there is no length limit in theory, and the limit depends on the support of the operating system.

There is also no limitation in the post method HTTP protocol specification, and the limitation is the processing capability of the server's processing program.

Therefore, the size limit is still affected by the different configuration of each web server.
-Submission of data security
POST is more secure than GET

Submitting data through GET, the username and password will appear on the URL in clear text, because the security of the GET method is weaker than that of the POST method for several reasons:

(1) The login page may be cached by the browser

(2) Others view the browser's history, then others can get your account and password

(3) When encountering cross-site attacks, the security performance is even worse

This article is reproduced, click to view the original article