Introduction to HTTP: Learn what HTTP is in one article

Table of contents

What is the HTTP protocol

HTTP workflow

HTTP request message

HTTP response message

HTTP status code

Advantages of HTTP based on TCP protocol

Persistent connection and non-persistent connection:

Talk about statelessness and state management in detail:

Summarize


The HTTP protocol (Hypertext Transfer Protocol) is the most widely used network protocol on the Internet. It defines how to establish, maintain and close the connection between the client and the server, and specifies the format and process of transmitting data. This article will explain the HTTP protocol in detail to understand its working principle, characteristics and practical applications.

What is the HTTP protocol

HTTP is an application layer protocol, which implements the protocol of a certain type of specific application, and its function is realized by an application program running in user space. HTTP is a protocol specification, which is recorded in the document and is an implementation program of HTTP that actually communicates through HTTP.

HTTP workflow

The basic flow of HTTP requests and responses is as follows:

  1. Establish a TCP connection: The client establishes a TCP connection with the Web server and communicates through a TCP socket.
  2. Send HTTP request: The client sends an HTTP request message to the Web server, including the request line, request header, blank line, and request data.
  3. The server accepts the request and returns an HTTP response: the web server parses the request, locates the requested resource, and writes a copy of the resource to the TCP socket for the client to read. HTTP response message includes status line, response header, blank line and response data.
  4. Release TCP connection: In HTTP/1.0, a short connection is used by default, that is, a new TCP connection will be established for each HTTP operation, and the connection will be terminated after the task ends. In HTTP/1.1, long connections are used by default to maintain the characteristics of TCP connections so that multiple HTTP operations can share the same TCP connection.
  5. The client browser parses the HTML content: After the client receives the HTTP response, the browser parses the response data according to the Content-Type header information in the response, usually in HTML format.

HTTP request message

HTTP request message is composed of request line, request header, blank line and request data.

  1. Request line: consists of request method, request URL and HTTP version. The request method represents the operation that the client wants the server to perform, such as GET, POST, and so on. The request URL is the resource path to be accessed by the request, such as /index.html. HTTP version is the HTTP protocol version used by the client, such as HTTP/1.1.
  2. Request header: Provide additional information of the request in the form of key-value values. Common request headers include Content-Type, User-Agent, Accept, etc. For example, the Content-Type header indicates the MIME type of the requested data, and the User-Agent header indicates the user agent program that initiated the request, such as the name and version of the browser.
  3. Empty line: used to separate the request header and request data.
  4. Request Data: Contains the actual data to be sent to the server. For example, in a POST request, the request data contains form data to be sent to the server.

In short, the HTTP request message contains the client's requirements for the server to perform specific operations, as well as related additional information.

HTTP response message

The HTTP response message is composed of status line, response header, blank line and response data.

  1. Status line: consists of HTTP version, status code and status message. The HTTP version indicates the HTTP protocol version used by the server, and the status code is a three-digit code indicating the processing result of the request. Common status codes include 200 (success), 404 (not found), etc. Status messages are textual interpretations of status codes that are human-readable.
  2. Response header: Provide additional information of the response in the form of key-value values. Common response headers include Content-Type, Content-Length, etc. The Content-Type header indicates the MIME type of the response data, and the Content-Length header indicates the number of bytes of the response data.
  3. Empty line: used to separate response headers and response data.
  4. Response data: Contains the actual data sent by the server to the client. For example, on a requested web page, the response data is the HTML code that the client browser parses and displays.

In short, the HTTP response message contains the server's response to the client's request, as well as related additional information.

HTTP status code

HTTP Status Code (HTTP Status Code) is a 3-digit code used to indicate the response status of the web server Hypertext Transfer Protocol. It is defined by the RFC 2616 specification and has been extended by RFC 2518, RFC 2817, RFC 2295, RFC 2774, and RFC 4918.

The HTTP status code consists of 3 digits, and the first digit represents one of the five statuses of the response.

  1. 1xx: Informational status code, indicating that the request has been received and processing continues.
  • 100: Continue the request, tell the HTTP client that the request has been received, but has not been fully processed, please continue.
  • 101: Switch the protocol and tell the HTTP client that the request has been received but not fully processed, please continue.
  1. 2xx: Success status code, indicating that the request has been successfully received, understood and processed by the server.
  • 200: OK, the request is successful, and the response header or data body expected by the request will be returned with this response.
  • 201: Created, the request is successful, and the new resource has been created and returned.
  • 202: Accepted, the request has been accepted, but the processing has not been completed.
  • 203: Unauthorized information, the request is successful, but the returned document does not come from the requested webpage, but from the server authorized by the webpage.
  • 204: No Content, the request was successfully received, but the server did not return any content.
  1. 3xx: Redirection status code, further operations must be performed to complete the request.
  • 300: Multiple options, the user already has multiple options, the server asks the client to choose one according to his preference.
  • 301: Moved permanently, the requested webpage has been permanently moved to a new URL.
  • 302: Found, the requested resource temporarily responded from a different URL.
  • 303: See Other, the requested resource has other URLs available.
  • 304: Not modified, the requested resource has not been modified and can be obtained directly from the cache.
  • 305: Using a proxy, the requested resource must be obtained through a proxy.
  1. 4xx: Client error status code, indicating that there is an error in the request sent by the client.
  • 400: Bad request, the server cannot understand the format of the request, and the request is invalid.
  • 401: Unauthorized, the request requires authentication.
  • 403: Forbidden, the server rejected the request.
  • 404: Not Found, The server could not find the given resource.
  • 405: Method Forbidden, the requested resource does not support the requested HTTP method.
  • 406: Unacceptable, according to RFC 7231 Section 5.3, the server will not return any content to the client (and does not contain the message body).
  • 408: The request timed out. The server timed out while waiting for the client to send more data or instructions.
  • 418: I'm a teapot, this status code was added as part of the "I'm a teapot" Internet joke.
  1. 5xx: Server error status code, indicating that an error occurred while the server was processing the request.
  • 500: Internal server error, the server encountered an unexpected error and could not complete the request.
  • 501: Not implemented, the server does not support the requested functionality.
  • 502: Bad gateway, when the server acts as a gateway or proxy, this error is generated due to receiving an invalid response.
  • 503: The service is unavailable, the server failed to respond to the request due to maintenance or heavy load.
  • 504: Gateway timeout. When the server acts as a gateway or proxy, it fails to receive a response from the upstream server and generates this error.

The first digit of all status codes defines the category of the response, and subsequent digits further refine the specific status of that category. These status codes are open standards and accepted worldwide.

Advantages of HTTP based on TCP protocol

The advantages of HTTP based on TCP protocol include the following aspects:

  1. Stable and reliable: TCP protocol is a reliable transmission protocol, which can guarantee the integrity and accuracy of data transmission. In the TCP protocol, a reliable connection is established between the sender and the receiver, and the reliable transmission of data is ensured through the handshake process, confirmation mechanism, and retransmission mechanism.
  2. High data transmission efficiency: The TCP protocol uses some optimization techniques, such as sliding window, flow control, and congestion control, to make data transmission more efficient. In contrast, the UDP protocol does not have these optimization techniques, and the efficiency of data transmission is relatively low.
  3. Support cross-network transmission: TCP protocol is a network transmission protocol that can transmit data between different networks. This enables the HTTP protocol to easily realize cross-network transmission, and cross-network forwarding can be easily realized through gateways such as nginx.
  4. Ease of implementation: The TCP protocol is a relatively simple protocol that is easy to implement and understand, so the HTTP protocol based on the TCP protocol is also relatively easy to implement.
  5. Support full-duplex communication: The TCP protocol supports full-duplex communication, that is, data transmission can be performed between the client and the server at the same time. This enables the HTTP protocol to implement two-way communication of requests and responses.

To sum up, the advantages of HTTP based on the TCP protocol include stability and reliability, high data transmission efficiency, support for cross-network transmission, easy implementation, and support for full-duplex communication. These advantages make the HTTP protocol one of the most widely used network protocols at present.

Persistent connection and non-persistent connection:

Persistent connection and non-persistent connection refer to whether a request (Request) corresponds to a response (Response) when communicating between the client and the server.

A non-persistent connection means that the connection is closed between each request and response. That is to say, when the client sends a request to the server, the server immediately returns a response and then closes the connection. The next request needs to re-establish the connection. The advantage of this method is that resources can be released in time, but the disadvantage is that the connection needs to be re-established for each request, which affects efficiency.

The persistent connection is in a TCP connection, allowing the interaction of multiple requests and responses. That is to say, within a TCP connection, the client can send multiple requests, and the server can also respond to these requests separately, until the client explicitly requests to close the connection or reaches a certain preset timeout period. The advantage of this method is that it reduces the number of connection establishments and improves efficiency. But it also takes up more resources for the server.

In general, the main difference between persistent connections and non-persistent connections is whether to maintain the state of the connection.

Talk about statelessness and state management in detail:

Stateless and State Management are important concepts in network communication. Stateless means that the protocol has no memory ability for transaction processing, and cannot save the information submitted by the client each time, that is, when the server returns a response, all the information of this transaction is lost. If the user sends a new request, the server has no way of knowing whether it is related to the previous request.

An example of a stateless protocol is HTTP (Hypertext Transfer Protocol). HTTP is a stateless, connection-oriented protocol that runs on the TCP/IP protocol stack and interacts through requests and responses from clients and servers. The HTTP protocol itself does not retain any association information between requests or responses, that is to say, the HTTP protocol is independent and stateless for all requests and responses.

The opposite of this is state management, which refers to maintaining information about the state of an application or system over a period of time. State management is necessary in many situations. For example, in processing user login, shopping cart, session management and other scenarios, it is necessary to save the user's state information.

Implementing state management in a stateless protocol is a challenge. A common method is to save state information on the server side through session tracking technology (Session Tracking), and associate different requests and responses through some kind of identifier (such as session ID). In this way, even though each request is independent and stateless, we can still manage the user's session through the state information saved by the server.

Another way is to use cookies. When the client sends a request, it can include cookies in the request, and these cookies contain state information that the server can recognize. The server can identify users based on these cookies and save relevant state information. When the client sends the request again, these cookies can be included again, so that the server can identify the user based on these cookies and restore the previous state information.

In general, statelessness means that the protocol does not keep a history of transaction processing, while state management refers to how to maintain the state information of the application or system over a period of time. Implementing state management in a stateless protocol requires the help of other technologies, such as session tracking or cookies.

Summarize

In this article, we introduced the basic concepts, characteristics, workflow and application scenarios of the HTTP protocol in detail. By analyzing the structure and content of the request message and response message of the HTTP protocol, we have a deep understanding of the details and practical applications of the HTTP protocol. At the same time, we also discussed some defects and deficiencies of the HTTP protocol, and proposed an improvement plan. In short, the HTTP protocol is one of the most important protocols on the Internet, and it is of great significance to the fields of Web development, network management and security.

Guess you like

Origin blog.csdn.net/wq2008best/article/details/132710740