Meter Network - HTTP

1. Introduction to HTTP

HTTP, the full name of Hypertext Transfer Protocol, is the Hypertext Transfer Protocol, which is an application layer protocol used to transmit data between the client and the server. It is one of the most widely used protocols on the Internet and is often used for communication between web browsers and web servers.

HTTP is a "convention and specification" for "transmitting" text, pictures, audio, video and other "hypertext" data between "two points" in the computer world.

The HTTP protocol adopts the client-server mode, the client sends an HTTP request to the server, the server receives and processes the request, and returns the HTTP response result. An HTTP request consists of three parts: request line, request header, and request body, and an HTTP response consists of three parts: status line, response header, and response body.

The HTTP protocol defines a large number of request methods and status codes. Common request methods include GET, POST, PUT, DELETE, etc. Common status codes include 200 OK, 301 Moved Permanently, 404 Not Found, 500 Internal Server Error, etc.

The HTTP protocol is based on the TCP/IP protocol, and uses the TCP protocol as the transport layer protocol to ensure the reliability of data transmission. At the same time, the HTTP protocol also has some security risks, such as plaintext transmission, man-in-the-middle attacks, etc., and corresponding security measures need to be taken to ensure network security.

In recent years, with the development of web applications and the popularization of the Internet, new versions of HTTP protocols such as HTTP/2 and HTTP/3 are also gradually popularized to improve network transmission efficiency and security.

2. HTTP protocol format

2.1 HTTP request format

HTTP request consists of three parts: request line, request header and request body :

请求方法(Method) Request-URL HTTP协议版本(HTTP-Version)
请求头(Header)
...
空行
请求正文(Body)
  • Request method (Method) : such as GET, POST, PUT, DELETE, etc.

  • Request-URL : The Uniform Resource Identifier (URL) of the requested resource, specifying the resource path to be accessed.

  • HTTP version number (HTTP-Version) : usually "HTTP/1.1" or "HTTP/2.0".

  • Request header (Headers) : Contains various header information of the request, such as User-Agent, Host, Accept, etc., expressed in the form of key-value pairs.

  • Request body (Body) : The POST request may contain the entity body of the request, which is used to transmit data. A GET request has no request body.

Description: The three fields of the request line are separated by "space", and the first field is separated by a line feed (CRLF, that is, carriage return + line feed). Because the number of Headers is uncertain, the last Header A blank line is used to indicate the end of the Header.

2.2 HTTP response format

HTTP协议版本(HTTP-Version) 状态码(Status-Code) 状态码描述(Reason-Phrase)
响应头(Header)
...
空行
响应正文(Body)
  • HTTP version number (HTTP-Version) : HTTP protocol version number, which is the same as that in the request message.

  • Status-Code : Indicates the result status code of the request processing, for example, 200 means success, 404 means resource not found, 500 means internal server error, etc.

  • Status Code Description (Reason-Phrase) : A brief description of the status code.

  • Response headers (Headers) : Contains various header information of the response, such as Server, Content-Type, Content-Length, etc., expressed in the form of key-value pairs.

  • Response body (Body) : The entity body of the response message, used to transmit the response data. For example, for HTML pages or JSON data, etc.

Note: The three fields of the status line are separated by "space". The first field is separated by a line feed (CRLF, that is, carriage return + line feed). Because the number of Headers is uncertain, the last Header A blank line is used to indicate the end of the Header.

Example:

#####请求消息#####
GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36
Accept: text/html,application/xhtml+xml

#####响应消息#####
HTTP/1.1 200 OK
Server: Apache/2.4.41 (Unix)
Content-Type: text/html; charset=utf-8
Content-Length: 1270

<!DOCTYPE html>
<html>
<head>
    <title>Welcome to Example.com</title>
</head>
<body>
    <h1>Hello, World!</h1>
</body>
</html>

3. HTTP header

HTTP header , also known as HTTP header , is a part of HTTP request and response messages , and is used to transmit additional metadata information . The HTTP header consists of multiple fields, each field consists of a name and a value, and the name and value are separated by a colon and a space. HTTP headers can be divided into two types: request headers and response headers.

3.1 Request header

The request header contains additional information about the HTTP request sent by the client to the server. Common request header fields include:

  • User-Agent : The client's user agent, identifying the browser or other client type.

  • Accept : Indicates the response content types that the client is able to handle.

  • Host : The host name or IP address of the requested target server.

  • Referer : Indicates from which URL the request is initiated, often used to track the source of the user.

  • Content-Type : Indicates the MIME type of the request body.

  • Authorization : Contains the credentials used for authentication, such as username and password.

3.2 Response header

The response header contains additional information about the HTTP response sent by the server to the client. Common response header fields include:

  • Content-Type : Indicates the MIME type of the response body.

  • Content-Length : Indicates the length of the response body.

  • Cache-Control : Instructs the client how to cache the response body.

  • Set-Cookie : Instructs to store a cookie on the client.

  • Expires : Indicates when the response expires.

  • Last-Modified : Indicates when the response body was last modified.

4. HTTP status code

HTTP status codes are divided into five categories, namely:

  • 1xx : Informational status code, indicating that the server has received the request and is processing it.

  • 2xx : Success status code, indicating that the server has successfully received the request and processed it.

  • 3xx : Redirection status code, indicating that further operations by the client are required to complete the request.

  • 4xx : Client error status code, indicating that the client's request has an error or cannot be completed.

  • 5xx : Server error status code, indicating that the server encountered an error while processing the request.

The following are the meanings of common HTTP status codes:

  • 200 OK : The request is successful, the server has successfully processed the request and returned a response.

  • 201 Created : The request is successful, the server has successfully created the resource and returned a response.

  • 204 No Content : The request was successful, the server has successfully processed the request, but did not return any response content.

  • 206 Partial Content : Applied to HTTP chunked download or resumable upload, indicating that the body data returned by the response is not all of the resource, but a part of it, and it is also the status of the server's successful processing.

  • 301 Moved Permanently : The requested resource has been permanently moved to a new URL.

  • 302 Found : The requested resource has been temporarily moved to a new URL.

  • 304 Not Modified : It does not have the meaning of jumping, indicating that the resource has not been modified, redirecting the existing buffer file, also known as cache redirection, that is, telling the client that the cache resource can continue to be used for cache control.

  • 400 Bad Request : The client request has an error and the server cannot understand the request.

  • 401 Unauthorized : The client requested authentication, but no valid authentication information was provided.

  • 403 Forbidden : The client request is rejected by the server and does not have permission to access the resource.

  • 404 Not Found : The requested resource does not exist on the server.

  • 500 Internal Server Error : The server encountered an error while processing the request.

  • 501 Not Implemented : Indicates that the function requested by the client is not yet supported, similar to "opening soon, please look forward to it".

  • 502 Bad Gateway : It is usually an error code returned by the server as a gateway or proxy, indicating that the server itself is working normally, and an error occurred when accessing the back-end server.

  • 503 Service Unavailable : It means that the server is currently busy and cannot respond to the client temporarily, similar to the meaning of "the network service is busy, please try again later".

 Five, HTTP GET and POST requests

5.1 General

  • For a GET request, the browser will send back the http header and data together, and the server will respond with 200 (return data).

  • For POSE, the browser sends the header first, the server responds with 100 continue, the browser sends data, and the server responds with 200 ok (return data)

5.2 Differences

  1. transfer method:

    • GET: Transfer data through the URL, and you can see the parameters and their corresponding values ​​in the URL. The data is appended after the URL, separated by question marks ?, and connected by & between parameters . Because the data is exposed in the URL, it is not suitable for transferring sensitive information or large amounts of data.

    • POST: The data is transmitted through the message body of the HTTP request, and the data will not be exposed in the URL, so it is more suitable for transmitting sensitive information and large amounts of data.

  2. Request length limit:

    • GET: Since the data is appended to the URL, the length of the URL is limited (usually several thousand characters), so the data length of the GET request is also limited.

    • POST: Since the data is transmitted in the message body, a large amount of data can be sent, and generally there is no length limit.

  3. safety:

    • GET: Because the data is exposed in the URL, it is not suitable for transmitting sensitive information, such as passwords, etc. At the same time, since the parameters and values ​​are visible, they may be easily intercepted by others.

    • POST: Since the data is in the message body, it is more secure than the GET request and more suitable for transmitting sensitive information.

  4. Request idempotence:

    • GET: GET requests are idempotent, that is, multiple requests to the same URL will only produce one result and will not affect server data.

    • POST: POST requests are not idempotent. Multiple POST requests to the same URL may produce different results, because operations such as creating resources or submitting data may be involved.

  5. Request semantics:

    • GET: It is generally used to obtain data and resources, and should not be used for operations that affect the server.

    • POST: Generally used to submit data, create resources, or perform modification operations on the server.

6. HTTP cache

HTTP caching refers to the technique of storing a copy of a requested resource in a web browser or proxy server and using that copy the next time the same resource is requested to avoid repeated requests and speed up page loading.

6.1 HTTP caching process

The workflow of HTTP caching is as follows:

  1. When a browser or proxy server requests a resource for the first time, the server will return the response of the resource and add cache-related fields in the response header, such as Cache-Control, Expires, Last-Modified, ETag, etc.

  2. The browser or proxy server will store the response in a local cache and generate a unique identifier (such as a URL or ETag) for the resource so that it can be matched the next time it is requested.

  3. When the browser or proxy server requests the same resource next time, it will first check whether there is a copy of the resource in the local cache, and judge whether the copy is still valid according to the fields related to the cache. If valid, read the copy directly from the local cache and return; otherwise, re-request the resource from the server.

6.2 HTTP cache method

Common HTTP caching strategies are as follows:

  1. Mandatory caching : When a browser or proxy server requests a resource for the first time, the server will add a Cache-Control or Expires field to the response header to specify the validity period of the resource in the client cache . During the validity period, the browser or proxy server will directly obtain the resource from the local cache, and will not send a request to the server again.

  2. Negotiated caching : When forcing a cache invalidation , the browser or proxy server sends a conditional request to the server to check whether the resource has been updated. The server will add a Last-Modified or ETag field in the response header to indicate the version of the resource. The client will store these fields locally and send them to the server on the next request, requesting an updated resource. If the server checks that the resource has not been updated, it returns a 304 Not Modified response, and the client directly obtains the resource from the local cache.

7. Problems with HTTP

The HTTP protocol has the following problems:

  1. Security issues: The data transmitted by the HTTP protocol is in clear text, which is easy to be stolen and viewed by a third party, so the confidentiality of data transmission cannot be guaranteed. At the same time, the HTTP protocol has no identity authentication mechanism, and it is easy to be disguised as other people for network attacks.

  2. Data integrity problem : The data transmitted by the HTTP protocol does not have integrity protection and is easily tampered and modified by a third party, so the integrity of data transmission cannot be guaranteed.

  3. Performance issues : The HTTP protocol uses clear text transmission, which has a large amount of data and relatively low transmission efficiency. At the same time, dynamic content cannot be cached, resulting in a slow response speed of the website.

  4. Unable to handle a large number of concurrent requests : The HTTP/1.0 protocol adopts the short connection mode (HTTP/1.1 defaults to the long connection mode) , each request needs to establish a connection, and the processing efficiency for a large number of concurrent requests is low.

To sum up, the HTTP protocol has problems such as security, data integrity, performance, and concurrent request processing. These problems are particularly prominent in modern Internet application scenarios. Therefore, in order to solve these problems, modern Internet applications generally use the HTTPS protocol to protect Security and stability of network communication.

Guess you like

Origin blog.csdn.net/m0_62573214/article/details/132090058