[Computer network] application layer-detailed HTTP protocol

An agreement is a kind of "agreement." There are two agreed methods, among which strings can be transmitted through sockets, and structured data can be transmitted through serialization and deserialization. This allows application layer data to be used through a certain protocol to complete resource requests. Next, I mainly explain the content of the HTTP protocol.

HTTP message format

Insert picture description here

  • The content of the first line of request and response
    Request line content: request method + URL + version number
    Status line content: version number + status code + status code explanation
  • Header Field The
    HTTP header field is one of the elements that constitute an HTTP message. In the process of communicating with HTTP, regardless of request or response, the header field must be used to transmit other important information besides data.
  • HTTP subcontracting and splitting The
    payload is separated by blank lines and content_Length, without considering splitting.

Common HTTP header fields

1. General header field

General refers to the fields used in both the request and response headers.

Cache-Control: Cache control

Before introducing this field, we must first understand the proxy server. The basic behavior of the proxy server is to receive the request sent by the client and forward it to other servers. When the proxy responds, it will save a copy of the resource on the proxy server. When the proxy server receives a request for the same resource again, it will It is not necessary to obtain it from the origin server, but directly return the previous cached resource as a response.
We can manipulate the working mechanism of the cache by filling instructions in the header field Cache-Control, and multiple instructions are separated by ",". So let's take a look at the instructions below:

  • public: When the proxy server executes this command, other users can also use the cache
  • private: The cache server will only provide cached services to specific users
  • no-cache: The client does not cache expired resources; the server prohibits caching of response resources
  • no-store: implies that there is confidential information in the request or response, and the resource cannot be cached
  • max-age: Give max-age a value. The cached resources whose client-side setting value does not exceed this time can be returned directly; the server setting value indicates the longest lifetime of the cached resource. In the http1.1 version, if the expires field appears at the same time, the max-age field will be processed first and the expires field will be ignored. However, in http1.0 this situation is just the opposite
  • min-fresh: This field must be set by the client, and it can be returned directly when the time of the cache resource does not exceed the set value
  • max-stale: It means that even if the resource expires but the expiration time is less than the max-stale value, it will be received as usual. If the value is not set, then it will be received as long as it expires
  • on-if-cached: The client will only ask for it to return if the cache server caches the target resource
  • must-revalidate: The agent will verify to the origin server whether the response resource that is about to be returned is still valid, if there is max-stale in the command, it will be ignored

Connection: persistent connection

  • Control the fields that are no longer forwarded to other agents: If you fill in this field with upgrade, the agent will delete the upgrade field before forwarding
  • Manage persistent connections: http1.1 default connections are persistent connections. If the server wants to explicitly disconnect, set the value of the specified Connection header field to Close. The http connection established before the http1.1 version is a non-persistent connection by default. If you want to maintain a long connection, set the Connection field to Keep-Alive

Upgrade: whether to use the upgraded version

Used to detect whether the HTTP protocol and other protocols can use higher versions for communication, and the parameter values ​​can be used to specify a completely different communication protocol. If the server finds that a higher version of the protocol can indeed be used for communication, the server returns a response with a status code of 101.

Data: Date and time when the HTTP message was created

Pragma: Fields left over from history, exist for backward compatibility. The same effect as setting no-cache and Cache-Control at the same time

Trailer: Explain in advance which header fields are recorded after the message body

Transfer-Encoding: Specifies the encoding method that should be used when transmitting the message body

Via: Track the transmission path of the request and response message between the client and the server. When the max-forward value is reduced to 0, the proxy server can no longer forward the request, and adds its own information to the via field and then returns it The response to the request.

Warning: evolved from the http1.0 response header retry-after, this field generally stores cache-related warnings.

2. Request header field

The request header field is a field used in the request message sent from the client to the server, and is used to add supplementary request information, client information, etc.

Host: the only one in the http1.1 specificationMust be includedHeader field in the request

The first field Host is closely related to the working mechanism of a virtual host that assigns multiple domain names to a single server. This is the meaning that Host must exist. When the request is sent to the server, the hostname in the request will be directly replaced by the ip address. However, if multiple domain names are deployed under the same IP address at this time, the server will not be able to understand which domain name corresponds to the request. Therefore, use this field Host to specify the requested host name. If the server does not set a host name, then just send a null value.

Accept: This field informs the server, the media type that the user agent can handle and the relative priority of the media type

Accept-Charset: The character set that the user agent can support and the priority of the character set. Multiple character sets can be specified at once

Accept-Encoding: The content encoding supported by the user agent and the priority order of encoding

Accept-Language: The natural language set supported by the user agent and the priority order of the natural language set

Authorization: authentication information of the user agent

Except: Expect a certain behavior, if the server cannot understand the behavior that the client wants to make, it will return a 417 status code

Form: The email address of the user of the user agent, use the From header field as much as possible when using the agent

If-Match: The determined value associated with a specific resource when the entity marks the Etag. The server receives the request when the value filled in this field matches the Etag value.

If-Modified-Since: If the value of this field is less than the update time of the resource, hope to process the request

If-None-Match: The request can be processed when the filled value is not the same as Etag. Generally, this field is used when GET or HEAD is used to request new unique resources

If-Range: If the value of this field is the same as the Etag value of the requested resource, it will be processed as a range request, otherwise the full value of the resource will be returned directly

If-UnModified-Since: The request is processed only when the resource has not been updated, and vice versa.

Max-Forwards: Fill a value before the request message is sent, and decrease by one every time it passes through a proxy server. When the value is reduced to 0, no forwarding is performed, and a response is returned directly.

So why do you need this field? When using the HTTP protocol for communication, the request may go through multiple servers such as a proxy. If the forwarding fails due to some reason on the way, the client will not be able to wait for the server's response, and we have no way of knowing this. And in some cases, messages will be forwarded in an infinite loop between the two servers, so that when Max-Forwards is reduced to 0, the response will not loop infinitely.

Range: Tell the server to request the specified range of the resource. If the server successfully processes the range request, it will return a 206 response, otherwise it will return the status code 200 ok and all resources

Referer: Tell the server the URL of the original resource requested, but this field may not be sent for security reasons

User-Agent: The browser that created the request and the user code name are communicated to the server. In addition, the agent service information may also be added to this field

3. Response header field

The response header field is a field used by the server to return a response message to the client, and is used to supplement additional information of the response, server information, and additional requirements for the client.

Accept-Ranges: The server tells the client whether the range can be processed to specify the acquisition of a certain part of the server-side resources. Specify bytes when processing range requests, otherwise specify none

Age: How long ago the origin server created the response. The agent must add the header field Age when creating the response, and the unit of the field is seconds

Etag: The resource is uniquely identified in the form of a string, and the server will assign a corresponding Etag value to each resource

When a resource is cached, it will be assigned a unique identifier. Return to Chinese resources when accessed with a Chinese browser, and return to English resources when accessed with an English browser. But we are visiting the same URL, and it is very difficult to use the same URL to identify two resources, so Etag must be used to identify the two resources.

Etag is further divided into strong Etag and weak Etag values. Strong Etag value, no matter how subtle the entity changes, it will change its value. The weak Etag value is only used to indicate whether the resources are the same. Only when the resources have undergone a fundamental change, the resulting differences will change the Etag. This will append W/ at the beginning of the field value.

Location: The response receiver is directed to a resource that is different from the requested URI. Almost all browsers will forcibly try to access the redirected resource after receiving a response containing the field Location.

Retry-After: Tell the client how long it should take to send the request again, mainly used with status code 503 or 3xx response

Server: Tell the client the information of the HTTP server application installed on the server, and may include the version number and the options for installation and enablement

4. Entity header field

This field is the header used by the entity part and is used to supplement the information related to the new time and entity of the content.

Content-Encoding: Tell the client server to encode the main part of the entity. Content encoding refers to the compression performed without losing the entity information

Content-Language: Tell the client service entity to use the natural language

Content-Length: the size of the body of the entity

Content-Type: The media type of the object in the entity body

Content-Location: What is the actual URL being returned, because sometimes the object you visit and the object returned are different, so it is necessary to set this field

Last-Modified: the time when the resource was finally modified

Expires: Tell the client the expiration date of the resource. The cache server will respond to the request with the cache after receiving the header field Expires. When the specified time has passed, it will turn to the origin server to request the resource.

HTTP method

Insert picture description here

Detailed explanation of important methods
  • GET method: Get the resource identified by URL

First look at the format of the URL:
Insert picture description here

If your method is GET, the server will parse out the path of the URL in your request header, compare this path with your own resource directory, and return the corresponding file. From the above figure, we can see that what the client wants to request is the index.htm file in the dir directory.

  • POST method: used to transfer the body of the entity

The above GET method can also be used to transfer entities, but due to the limitation of the URL length, the POST method is generally used to transfer. Conversely, POST can also request resources, but the main purpose of POST is not to obtain the main content of the response.

  • PUT: transfer files to the specified path

The PUT method is a method supported only in the HTTP1.1 version. Since the PUT method of HTTP/1.1 does not have a verification mechanism, anyone can upload files, and there are security issues, so general web sites do not use this method.

  • DELETE: delete the specified resource

This method is more insecure than the PUT method. You should have never seen that you can actively delete resources on a website! !

Regarding the difference between get, post, and put methods to obtain resources, click the link to see the detailed explanation of the next blog.
Interviewers often ask: the difference between get, post, put and methods

HTTP status code

The HTTP status code is responsible for indicating the return result of the client's HTTP request, marking whether the processing on the server side is normal, and notifying errors.
Insert picture description here

Detailed explanation of some status codes
  • Informational status codes beginning with 1

100 continue : Indicates that the client should continue to send requests.
This temporary response is used to notify the client that part of its request has been received by the server and has not been rejected. This status code often appears after the TCP three-way handshake is completed using the POST method to transfer resources, the client confirms to the server first, and the client sends the data when the server returns a 100 continue response message.

  • Success status code starting with 2=

200 ok : Indicates that the request sent from the sender is processed normally by the server and the requested resource is returned.
204 no content : indicates that the request received by the server has been successfully processed, but the main part of the resource is not returned. It is generally used when the client sends information to the server, but the server does not need to send new information content.

  • Redirect status codes beginning with 3

301 moved permanently : indicates that the requested resource has been allocated to the new URL, and the requested resources will be in the new URL in the future, also known as permanent redirection.
302 found : indicates that the requested resource has been reassigned the URL, and the next request But not necessarily this URL, called temporary redirect
303 see other : Temporary redirect, completely inherit the rules of 302: If it is POST method, actively change to GET method, and then redirect
307 temporary redirect : Temporary redirect, yes to 302 The specification: If the request is not GET or HEAD, redirection is not allowed.
Here 303 and 307 are formed by the two strict divisions of HTTP to 302

  • Client error status codes starting with 4

400 bad request : Indicates that there is a syntax error in the request message.
401 unauthorized : indicates that the request sent needs to pass HTTP authentication.
403 forbidden : Indicates that the client's request for resource access was rejected by the server. A 403 may appear if you try to access from an unauthorized sender ip address.
404 not found : Indicates that the resource we requested does not exist on the server, similar to an out-of-bounds access.
413 request entity too large : indicates that the size of the URL entity data submitted by the request exceeds the range that the browser or server is willing or able to handle.

  • Server error status codes starting with 5
    500 inernal server error : Indicates that an error occurred on the server side while executing the request, or an error occurred in the web application.
    503 service unavailable : Indicates that the server is temporarily overloaded or is shutting down for maintenance, and the request cannot be processed now.

HTTP protocol format example

The http protocol is easy to see in daily life, because we all use the http protocol when using the browser. Double-click the browser input bar to see the hidden http/https. I have borrowed from an article by other people here, and it is also very clear for reference~
htpp request format and response format

My detailed explanation of the application layer HTTP is still very simple. I will continue to add new content when I learn it. Welcome everyone to leave a message to discuss and make progress together~

Guess you like

Origin blog.csdn.net/ly_6699/article/details/100084132