[Computer Network Notes 5] Application Layer (2) HTTP Message

Insert image description here

HTTP message format

Insert image description here

The structures of request messages and response messages of the HTTP protocol are basically the same and consist of four parts:

  • ① Start line : Describes the basic information of the request or response;
  • ② Header field set (header) : Use key-valuethe form to describe the message in more detail;
  • ③ Blank line + CRLF carriage return and line feed
  • ④ Message body (body) : The actual transmitted data, it is not necessarily plain text, it can be binary data such as pictures and videos.

Insert image description here

HTTP is a "plain text" protocol, so the header data is ASCII text and can be easily read with the naked eye.

Request message format:

Insert image description here

Response message format:

Insert image description here

Note: According to RFC 2616 (HTTP/1.1), in the header field, there can be one or more optional spaces in front of the field value, but there cannot be spaces in the field name and between the field name and the field name :. This is why the HTTP packets obtained by browser developer tools and some packet capture tools are more readable versions with spaces.

Request line of request message

The request line briefly describes how the client wants to operate on the server-side resource.

Insert image description here

The request line consists of three parts: request method + URI + + version number +空格空格CRLF回车换行

  • ① Request method : It is a verb, such as GET / POST , indicating the operation method of resources;
  • ② Request target : The path of the request target, usually a URI , marking the resource to be operated by the request method;
  • ③ Version number : Indicates the HTTP protocol version used by the message , such as HTTP/1.1.

These three parts are usually separated by spaces :

Insert image description here

URI : Uniform Resource Identifier (Uniform Resource Identifier)

Status line of response message

Here it is not called the " response line ", but the " status line", which means the status of the server's response.

Insert image description here

The status line consists of three parts: version number + 空格+ status code + 空格+ reason +CRLF回车换行

  • ① Version number : Indicates the HTTP protocol version used by the message , such as HTTP/1.1;
  • ② Status code : a 3 -digit number indicating the result of processing, such as 200-success, 500-server error, 404-resource does not exist;
  • ③ Reason description : As a supplement to the digital status code, it is a more detailed explanation text to help people understand the reason.

header field

The HTTP protocol specifies a large number of header fields to implement various functions, but they can basically be divided into four categories:

  • ① Common fields : can appear in both request headers and response headers; for example Date,Connection

  • ② Request field : can only appear in the request header to further describe the request information or additional additional conditions; such as Host, Acceptetc.

  • ③ Response field : It can only appear in the response header and supplement the information of the response message; for example, Serveretc.

  • ④ Entity field : It is actually a general field , but it specifically describes additional information of the body . For example, Content-Lengthwait

The parsing and processing of HTTP messages is actually mainly about the processing of header fields. Understanding the header fields also means understanding the HTTP messages.

Commonly used header fields - Host field

  • Host is a request field and can only appear in the request header.

  • Host is also the only field that is required to appear in the HTTP/1.1 specification . That is to say, if there is no Host in the request header, then this is an error message .

  • The Host field actually tells the server which host this request should be processed by.

    Insert image description here

Commonly used header fields - User-Agent field

  • User-Agent is a request field that only appears in the request header .

  • It uses a string to describe the client that initiated the HTTP request, and the server can use it to return the page that is most suitable for this browser.

    Insert image description here

Commonly used header fields - Date field

  • The Date field is a general field, but it usually appears in the response header , indicating the time when the HTTP message was created . The client can use this time in combination with other fields to determine the caching strategy.

Commonly used header fields - Server field

  • The Server field is a response field and can only appear in the response header. It tells the client the name and version number of the software currently providing the Web service.

  • The Server field does not have to appear , because it will expose part of the server's information to the outside world. If there happens to be a bug in this version, hackers may use the bug to compromise the server. Therefore, some websites either do not have this field in the response header, or give a completely irrelevant description information.

Commonly used header fields - Content-Length field

  • One of the entity fields to talk about is Content-Length , which represents the length of the body in the message , that is, the length of the data after the blank line in the request header or response header.

  • When the server sees this field, it knows how much data will follow and can receive it directly. If there is no such field, then the body is of variable length and needs to be transmitted in chunks using chunked method.

body related header fields

Insert image description here
Insert image description here

  • Accept: Indicates the data types that can be parsed by the client. Multiple types can be listed separated by commas.

  • Accept-Encoding: Indicates the compression format supported by the client, which can be omitted (no compression)

  • Accept-Language: Language supported by the client

  • Accept-Charset: Usually not sent, browsers support multiple character sets

  • Content-Type: The true type of the response entity, which can appear in request headers and response headers

  • Content-Encoding: The server tells the client which compression algorithm is used for the response entity, which can be omitted (no compression)

  • Content-Language: Usually not sent by the server, it indicates the language used, which can generally be inferred from the character set in Content-Type, such as Content-Type: text/html; charset=utf-8

  • Transfer-Encoding: chunked: indicates chunked transfer. Each chunk contains two parts: length header and data block. The length of the last chunk is 0 to indicate the end, that is, "0\r\n\r\n"

    Transfer-Encoding: chunked and Content-Length fields are mutually exclusive, and these two fields cannot appear at the same time in the response message.

body data type

The four commonly used types of data in HTTP include: text, image, audio/video,application

Each major category is subdivided into multiple subcategories, in the form of type/subtypea string of " "

  • text: Readable data in text format, such as text/htmlhypertext, text/plainplain text, text/cssstyle sheets, etc.

  • image: Image files, such as image/gif, image/jpeg, image/pngetc.

  • audio/video: Audio and video data, such as audio/mpeg, video/mp4etc.

  • application: The format is not fixed and can be text or binary. It must be interpreted by the upper-layer application. Common ones include application/json, application/javascript, application/pdfetc.

    If you don’t know what type the data is, use application/octet-streamopaque binary data.

body data compression

Insert image description here

Compression algorithm:

  • gzip: GUN zip compression format, the most popular compression algorithm on the Internet
  • deflate: zlibCompression format
  • br: A new compression algorithm specially optimized for http

Compress the corresponding header fields:

Insert image description here

  • The Accept field marks the data types that can be parsed by the client . It can be used ,as a separator to list multiple types, giving the server more options.

    The Accept field in the picture above tells the server: "I can understand HTML, json text, and webp and png images. Please give me data in these four formats."

  • The Accept-Encoding field marks the compression format supported by the client.

  • Content-Encoding : The server can choose one of the compression formats described in the Accept-Encoding field to compress data. The actual compression format used is placed in the Content-Encoding field of the response header.

    Insert image description here

Both Accept-Encoding and Content-Encoding can be omitted, that is, no compression is performed.

body language type/character set encoding

Corresponding header field:

Insert image description here

However, current browsers support multiple character sets and usually do not send Accept-Charset , and the server does not send Content-Language , because the language used can be inferred from the character set, so the request header generally only contains Accept-Language field, there will only be Content-Type field in the response header.

body content negotiated quality value

When using Accept , Accept-Encoding , Accept-Language and other request header fields for content negotiation in the HTTP protocol , you can also use a special " q" parameter to represent the weight to set the priority . The " " here qis the "quality factor " the meaning of.

The maximum value of the weight is 1, the minimum value is 0.01, and the default value is 1. If the value is, 0it means rejection. The specific form is to add a " " after the data type or language code ;, and then follow it with " q=value".

Insert image description here

It means that the browser wants to use htmlfiles most, with a weight of 1, followed by xmlfiles, with a weight of 0.9, and finally any data type */*, with a weight of 0.8.

After the server receives the request header, it will calculate the weight and then output HTML or XML first according to its actual situation.

Chunked transfer

Insert image description here
Insert image description here

"Transfer-Encoding: chunked" means that the body part of the message is not sent at once, but is divided into many chunks and sent one by one.

Note: The two fields "Transfer-Encoding: chunked" and "Content-Length" are mutually exclusive. These two fields cannot appear at the same time in the response message, because the length of a response message is either known or Unknown (chunked).

Get data by range

  • Accept-Range: bytes appears in the response message, indicating that the server supports fetching range data by bytes.
  • Range: bytes= <start>- <end>Appears in the request message, indicating which piece of data is to be fetched
  • Content-Range: <start>- <end>/total appears in the response message, indicating which piece of data is sent.

Main uses: breakpoint resume download, multi-threaded download.

HTTP request method

Currently, HTTP/1.1 stipulates eight request methods, and all words must be in uppercase:

  • GET : Obtain resources, which can be understood as reading or downloading data;
  • HEAD : Get meta-information of resources;
  • POST : Submit data to resources, which is equivalent to writing or uploading data;
  • PUT : similar to POST;
  • DELETE : delete resources;
  • CONNECT : Establish a special connection tunnel;
  • OPTIONS : List the methods that can be performed on the resource;
  • TRACE : Trace request-response transmission path.

GET and HEAD

  • The meaning of GET is to request to obtain resources from the server. This resource can be static text, pages, pictures, videos, or dynamically generated pages or data in other formats by PHP or Java.

  • The HEAD method is similar to the GET method. It also requests resources from the server. The server's processing mechanism is the same, but the server will not return the requested entity data, but will only return the response header, which is the "metainformation" of the resource.

For example, if you want to check whether a file exists, you only need to send a HEAD request. There is no need to use GET to fetch the entire file. For another example, if you want to check whether a file has the latest version, you should also use HEAD . The server will return the modification time of the file in the response header.

POST and PUT

  • POST is also a frequently used request method. The frequency of use should be second only to GET . There are also many application scenarios. As long as data is sent to the server, POST is mostly used .

  • PUT has a similar function to POST and can also submit data to the server, but it is subtly different from POST. Usually POST means " new " and " create ", while PUT means " modify " and " update ".

In practical applications, PUT is rarely used. Moreover, because its semantics and functions are too similar to POST , some servers even directly prohibit the use of the PUT method and only use the POST method to upload data.

Response status code

The status code currently specified in the RFC standard is three digits , so the value range is[000~999]

The RFC standard divides status codes into five categories . The first digit of the number is used to indicate the category instead 0~99of . In this way, the actual usable range of status codes is greatly reduced, [000~999]from [100~599].

The specific meanings of these five categories are:

  • 1xx: Prompt message, indicating that the current protocol processing is in an intermediate state and subsequent operations are required;
  • 2xx: Success, the message has been received and processed correctly;
  • 3xx: Redirect, the resource location changes, and the client needs to resend the request;
  • 4xx: Client error, the request message is incorrect, and the server cannot process it;
  • 5xx: Server error. An internal error occurred while the server was processing the request.

Status code details

status code status information meaning
100 Continue The initial request has been accepted and the client should continue sending the remainder of the request. (New in HTTP 1.1)
101 Switching Protocols The server will comply with the client's request and switch to another protocol (new in HTTP 1.1)
102 Processing A status code extended by WebDAV (RFC 2518) indicating that processing will continue.
200 OK Everything works fine, response documents to GET and POST requests follow.
201 Created The server has created the document and its URL is given in the Location header.
202 Accepted The request has been accepted, but processing has not yet been completed.
203 Non-Authoritative Information The document has been returned normally, but some response headers may be incorrect because a copy of the document is used (new in HTTP 1.1).
204 No Content Without a new document, the browser should continue to display the original document. This status code is useful if the user refreshes the page regularly and the servlet can determine that the user's document is current enough.
205 Reset Content There is no new content, but the browser should reset what it displays. Used to force the browser to clear form input content (new in HTTP 1.1).
206 Partial Content The client sends a GET request with a Range header and the server completes it (new in HTTP 1.1).
207 Multi-Status The status code extended by WebDAV (RFC 2518) means that the subsequent message body will be an XML message and may contain a series of independent response codes depending on the number of previous subrequests.
300 Multiple Choices The document requested by the client can be found in multiple locations, which are listed within the returned document. If the server wants to propose a preference, it should indicate it in the Location response header.
301 Moved Permanently The document requested by the client is elsewhere, the new URL is given in the Location header, and the browser should automatically access the new URL.
302 Found Similar to 301, but the new URL should be considered a temporary replacement rather than a permanent one. Note that the corresponding status information in HTTP 1.0 is "Moved Temporatily".
When this status code occurs, the browser can automatically access the new URL, so it is a useful status code.
Note that this status code can sometimes be used interchangeably with 301. For example, if the browser mistakenly requests http://host/~user (missing the trailing slash), some servers will return 301, and some will return 302.
Strictly speaking, we can only assume that the browser will automatically redirect only if the original request was a GET. See 307.
303 See Other Similar to 301/302, except that if the original request was a POST, the redirect target document specified by the Location header should be fetched via GET (new in HTTP 1.1).
304 Not Modified The client has a buffered document and makes a conditional request (usually by providing an If-Modified-Since header to indicate that the client only wants documents that are newer than the specified date). The server tells the client that the original buffered document can continue to be used.
305 Use Proxy Documents requested by the client should be retrieved through the proxy server specified by the Location header (new in HTTP 1.1).
306 Switch Proxy In the latest version of the specification, the 306 status code is no longer used.
307 Temporary Redirect Same as 302 (Found). Many browsers will incorrectly redirect with a 302 response, even if the original request was a POST, even though it can actually only redirect if the response to a POST request is a 303. For this reason, HTTP 1.1 added 307 to more clearly distinguish between several status codes: when a 303 response occurs, the browser can follow redirected GET and POST requests; if it is a 307 response, the browser can only follow Redirection of GET requests. (New in HTTP 1.1)
400 Bad Request A syntax error occurred with the request.
401 Unauthorized A customer attempted to gain unauthorized access to a password-protected page. The response will contain a WWW-Authenticate header, and the browser will display the username/password dialog box accordingly, and then make the request again after filling in the appropriate Authorization header.
402 Payment Required This status code is reserved for possible future needs.
403 Forbidden The resource is unavailable. The server understands the client's request but refuses to process it. Usually caused by the permission settings of files or directories on the server.
404 Not Found The resource at the specified location cannot be found. This is also a common response.
405 Method Not Allowed The request method (GET, POST, HEAD, DELETE, PUT, TRACE, etc.) is not applicable to the specified resource. (New in HTTP 1.1)
406 Not Acceptable The specified resource was found, but its MIME type is incompatible with the one specified by the client in the Accpet header (new in HTTP 1.1).
407 Proxy Authentication Required Similar to 401, it means that the client must first be authorized by the proxy server. (New in HTTP 1.1)
408 Request Timeout The client has not issued any requests during the waiting time allowed by the server. The client can repeat the same request later. (New in HTTP 1.1)
409 Conflict Usually related to PUT requests. The request cannot succeed because it conflicts with the current state of the resource. (New in HTTP 1.1)
410 Gone The requested document is no longer available, and the server does not know which address to redirect to. The difference between it and 404 is that returning 407 means that the document has permanently left the specified location, while 404 means that the document is unavailable for unknown reasons. (New in HTTP 1.1)
411 Length Required The server cannot process the request unless the client sends a Content-Length header. (New in HTTP 1.1)
412 Precondition Failed Some preconditions specified in the request header failed (new in HTTP 1.1).
413 Request Entity Too Large The target document is larger than the server is currently willing to handle. If the server thinks it can handle the request later, it should provide a Retry-After header (new in HTTP 1.1).
414 Request URI Too Long URI too long (new in HTTP 1.1).
415 Unsupported Media Type For the currently requested method and requested resource, the entity submitted in the request is not in a format supported by the server, so the request is rejected.
416 Requested Range Not Satisfiable The server cannot satisfy the Range header specified by the client in the request. (New in HTTP 1.1)
417 Expectation Failed The expected content specified in the request header Expect cannot be met by the server, or the server is a proxy server and it has clear evidence that the content of Expect cannot be met on the next node in the current route.
421 There are too many connections from your internet address 从当前客户端所在的IP地址到服务器的连接数超过了服务器许可的最大范围。通常,这里的IP地址指的是从服务器上看到的客户端地址(比如用户的网关 或者代理服务器地址)。在这种情况下,连接数的计算可能涉及到不止一个终端用户。
422 Unprocessable Entity 请求格式正确,但是由于含有语义错误,无法响应。(RFC 4918 WebDAV)
423 Locked 当前资源被锁定。(RFC 4918 WebDAV)
424 Failed Dependency 由于之前的某个请求发生的错误,导致当前请求失败,例如 PROPPATCH。(RFC 4918 WebDAV)
425 Unordered Collection 在WebDav Advanced Collections 草案中定义,但是未出现在《WebDAV 顺序集协议》(RFC 3658)中。
426 Upgrade Required 客户端应当切换到TLS/1.0。(RFC 2817)
449 Retry With 由微软扩展,代表请求应当在执行完适当的操作后进行重试。
500 Internal Server Error 服务器遇到了意料不到的情况,不能完成客户的请求。
501 Not Implemented 服务器不支持实现请求所需要的功能。例如,客户发出了一个服务器不支持的PUT请求。
502 Bad Gateway 服务器作为网关或者代理时,为了完成请求访问下一个服务器,但该服务器返回了非法的应答。
503 Service Unavailable 服务器由于维护或者负载过重未能应答。例如,Servlet可能在数据库连接池已满的情况下返回503。服务器返回503时可以提供一个Retry-After头。
504 Gateway Timeout 由作为代理或网关的服务器使用,表示不能及时地从远程服务器获得应答。(HTTP 1.1新)
505 HTTP Version Not Supported 服务器不支持请求中所指明的HTTP版本。(HTTP 1.1新)
506 Variant Also Negotiates 由《透明内容协商协议》(RFC 2295)扩展,代表服务器存在内部配置错误:被请求的协商变元资源被配置为在透明内容协商中使用自己,因此在一个协商处理中不是一个合适的重点。
507 Insufficient Storage 服务器无法存储完成请求所必须的内容。这个状况被认为是临时的。WebDAV (RFC 4918)
509 Bandwidth Limit Exceeded 服务器达到带宽限制。这不是一个官方的状态码,但是仍被广泛使用。
510 Not Extended 获取资源所需要的策略并没有没满足。(RFC 2774)

HTTP 重定向

当一个客户端访问某个 URL 的时候,由于某种原因,服务端会告诉客户端需要重新访问另一个 URL,这就是重定向

最常见的重定向状态码就是301302301俗称“永久重定向”(Moved Permanently), 302 俗称“临时重定向”(“Moved Temporarily”) 。

Insert image description here

Location” 字段属于响应字段,必须出现在响应报文里。但只有配合 301/302 状态码才有意义,它标记了服务器要求重定向的 URL

301 永久重定向

很多客户端记住的是原 URL,但是这个 URL在服务端已经不用了,此时请求服务器会返回 301,浏览器看到 301,就知道原来的 URL “过时”了,就会做适当的优化。比如刷新历史记录、更新书签,下次可能就会直接用新的 URL 访问,省去了再次跳转的成本。

302 临时重定向

302 俗称“临时重定向”(“Moved Temporarily”),意思是原 URL 处于“临时 维护”状态,新的 URL是起“顶包”作用的“临时工”。

浏览器看到 302,会认为原来的 URL 仍然有效,但暂时不可用,所以只会执行简单的跳转页面,不记录新的 URL,也不会有其他的多余动作,下次访问还是用原 URL。

比如,服务器中临时维护某个 URL,就可以返回 302 状态码。

URL 格式

URI 是统一资源标识符,URL 是统一资源定位符,URL 是 URI 的一种具体实现。

URL 格式schema://host:port/path

Insert image description here

  • schema:协议名,表示资源应该使用哪种协议来访问。如 http、https
  • scheme 之后,必须是三个特定的字符 ://,它把 scheme 和后面的部分分离开
  • host:port:// 之后,是资源所在的主机名,即主机名加端口号
  • path是资源在主机上的位置路径
  • 有了协议名和主机地址、端口号,再加上后面标记资源所在位置的path,浏览器就可以连接服务器访问资源了。

查询参数:schema://host:port/path?key=value&key=value

Insert image description here

  • path 后面使用一个"?“分割开来,”?"后面的就是query查询参数部分,query部分是由“&”拼接的多个key=value键值对

When obtaining images, if you want to specify different sizes, the method of "protocol name + host name + path" cannot adapt to these scenarios, so there is a "query" part after the URI, which uses a "query" after the path. ?" begins, but does not contain "?", indicating additional requirements for resources.

Fragment ID:

Insert image description here

It is an "anchor" or "tag" inside the resource located by the URI. The browser can jump directly to the location indicated by it after obtaining the resource.

But fragment identifiers can only be used by clients like browsers and cannot be seen by the server. In other words, the browser will never #fragmentsend a URI with a to the server, and the server will never handle resource fragments in this way.

URL encoding

Only ASCII codes can be used in the URI, but what if you want to use Chinese, Japanese and other languages ​​other than English in the URI?

Also, in some special URIs, " " and other characters that function as delimiters will appear in pathand inside, which will cause URI parsing errors. What should we do in this case?query@&?

URI introduces an encoding mechanism, which performs a special operation on character sets and special characters other than ASCII codes to convert them into a form that does not conflict with the URI semantics.

The rules of URI escaping are a bit "simple and crude". They directly convert non-ASCII codes or special characters into hexadecimal byte values, and then add a " " in front of them %.

For example, spaces are escaped as " %20" and " ?" is escaped as " %3F". Chinese, Japanese, etc. usually use UTF-8 encoding and then escape. For example, "Galaxy" will be escaped into "%E9%93%B6%E6%B2%B3".

Guess you like

Origin blog.csdn.net/lyabc123456/article/details/133254162