Demystifying the HTTP Protocol: Explore the password behind the Internet and explore the mysteries of data transmission

HTTP (Hypertext Transfer Protocol: Hypertext Transfer Protocol) is a protocol for transmitting data on the Web, and it is one of the most important application layer protocols on the Internet. Since its inception, HTTP has always played the role of a communication bridge connecting the world, and has played an important role in the development and popularization of the Internet. This article will give you an in-depth look at the origins of the HTTP protocol, how it works, its common features, and its impact on the modern Web.

1. Origin and development

The HTTP protocol was first created by Tim Berners-Lee in 1989 as a communication protocol for connecting documents on the World Wide Web. Originally, the goal of HTTP was to transmit hypertext documents (HTML) and hyperlinks (Hyperlink) between clients and servers. With the vigorous development of the Internet, HTTP is also constantly evolving and improving.

In 1991, the first official version of HTTP, HTTP/0.9, came out. Its function is very simple, and it can only transmit plain text HTML documents. Then, in 1996, HTTP/1.0 was released, which introduced more features to support the transmission of resources in multiple formats, such as images, style sheets, and script files. However, with the continuous expansion of web applications and the Internet, the performance of HTTP/1.0 has become a bottleneck. In order to solve the performance problem of HTTP/1.0, HTTP/1.1 was released in 1997. It introduced features such as persistent connections, pipelined requests, and Host header fields, which significantly improved the loading speed of web pages and user experience.

Today, the HTTP protocol continues to develop. The latest versions are HTTP/2 (released in 2015) and HTTP/3 (released in 2022), which further optimize performance, security, and parallel processing capabilities, and gradually become the mainstream HTTP protocol. Version.

2. Working principle

HTTP is a stateless protocol, each request is independent, and the server does not retain state information related to previous requests. HTTP uses a client-server model, the client sends a request, the server processes the request and returns a response.

The HTTP communication process follows the following steps:

  1. Establishing a connection: The client (usually a web browser) initiates a TCP connection to the server and establishes a two-way communication channel.
  2. Send request: The client sends an HTTP request to the server. The request includes the request line (request method, URL and HTTP version), request header and request body (for requests with content such as POST).
  3. Processing request: The server receives and parses the request, and performs corresponding operations according to the content of the request, which may include reading the database, processing business logic, etc.
  4. Send response: The server encapsulates the processing result as an HTTP response, and the response includes a response line (status code, status text, and HTTP version), a response header, and a response body.
  5. Close the connection: After the server sends the response, close the TCP connection, and the request-response process is completed.

3. Characteristics and influence

The HTTP protocol has the following characteristics, which have had a profound impact on modern web applications and the Internet:

  1. Connectionless and stateless: HTTP is connectionless, each request and response is independent. It is also stateless, the server does not save the state information of the client, and each request is irrelevant, which is conducive to scalability and flexibility.
  2. Flexibility: The flexibility of HTTP enables it to transmit various types of resources, such as text, pictures, videos, etc., enabling web applications to present rich and diverse content.
  3. Based on the request-response model: HTTP's request-response model enables the client to request data or operations from the server and obtain the server's response. This model facilitates interaction and communication between clients and servers.
  4. Layered architecture: HTTP's layered architecture allows optimizing network transmission through proxy servers and caching, improving performance and response speed.
  5. Status code: The status code of HTTP provides an explanation of the request processing result, such as 200 means success, 404 means not found, 500 means server error, etc. These status codes are useful for diagnosing and debugging web applications.

4. Composition of the agreement

The HTTP protocol is a specification whose samples are usually presented in the form of requests and responses. The following is a sample of the HTTP protocol, showing the format of the HTTP request and HTTP response respectively.

Sample HTTP request:

GET /hanko HTTP/1.1
Host: www.hanko.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36
Accept: text/html,application/xhtml+xml
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
  • The first line is the request line, including the request method (GET), the requested resource path (/hanko) and the HTTP protocol version (HTTP/1.1).
  • Next is the request header, which contains some additional information about the request, such as Host (the target host of the request), User-Agent (the browser information of the client), Accept (the response content type accepted by the client), etc.
Ali interviewed back then and asked what does it mean that the http header connection is keep-alive?
When the Connection field in the HTTP request header is set to keep-alive, it indicates that the client wants to establish a persistent connection with the server. Persistent connections allow multiple HTTP requests and responses to be sent on the same TCP connection, instead of establishing a new TCP connection for each request, thereby reducing connection establishment and closing overhead and improving performance and efficiency.
In addition to keep-alive, the Connection field in the HTTP request header can also contain other values ​​for the following purposes:
  • close: When Connection is set to close, it means that the client or server wants to close the TCP connection after sending the current request and response, that is, no persistent connection is used.
  • Upgrade: Used for HTTP upgrade. When Connection is set to Upgrade, it means that the client wants to upgrade to other protocols, such as WebSocket.
  • Connection-Token: A custom Connection token that can be used to pass additional information or indicate a specific processing method.

Although named "keep-alive", HTTP's persistent connection (keep-alive) is not really a constant connection. In fact, HTTP's persistent connection is a mechanism that can send multiple HTTP requests and responses on a single TCP connection, thereby avoiding the need to re-establish a new TCP connection for each request, thus reducing the connection establishment and closing overhead. .

Sample HTTP response:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1234
Server: Apache/2.4.41 (Ubuntu)
Date: Wed, 20 Jul 2023 12:34:56 GMT

<!DOCTYPE html>
<html>
<head>
    <title>Example Page</title>
</head>
<body>
    <h1>Hello, World!</h1>
    <p>This is an example page.</p>
</body>
</html>
  • The first line is the response line, containing the HTTP protocol version (HTTP/1.1), status code (200) and status text (OK).
  • Next is the response header, which contains some additional information about the response, such as Content-Type (response content type), Content-Length (response content length), Server (server information), etc.
  • There is a blank line between the response header and the response body, indicating the end of the response header, followed by the response body, which is the content actually returned to the client. In this example, the response body is a simple HTML page.

1. Request header

Common request headers:

  • Host: Specify the target host name and port number of the request.
  • User-Agent: Specifies the user agent sending the request, usually the browser ID.
  • Accept: Specify the response content type that the client can accept, such as text, image, JSON, etc.
  • Accept-Language: Specifies the natural language acceptable to the client for internationalization.
  • Authorization: Used for identity verification, including authentication information.
  • Cookie: Used to pass session information between client and server.

2. Response header

Common response headers:

  • Content-Type: Specify the type of response content, such as text/html, application/json, etc.
  • Content-Length: Specifies the length of the response content in bytes.
  • Location: For redirection, specify a new URL address.
  • Set-Cookie: Used to set cookies on the client side to maintain session state.
  • Cache-Control: Specifies the cache policy of the response, such as no-cache, max-age, etc.

3. Status code

The HTTP status code is a numerical code used in the HTTP protocol to indicate the processing result of the server to the request. The status code consists of three digits, and each status code represents a different processing result. HTTP status codes are mainly divided into five categories, each starting with a different number, and each type of status code has a specific meaning. The following are common HTTP status codes and their meanings:

1xx (Informational): Indicates that the request has been received and continues to be processed.

  • 100 Continue: The server has received the headers of the request, and the client should continue sending the rest of the request.

2xx (Successful): Indicates that the request has been successfully received, understood and processed by the server.

  • 200 OK: The request was successful, and the server has successfully processed the request.
  • 201 Created: The request was successful and a new resource was created on the server.
  • 204 No Content: The request is successful, but the response does not contain the content of the entity body, which is used for successful DELETE requests, etc.

3xx (Redirection): Indicates that further action is required to complete the request.

  • 301 Moved Permanently: The requested resource has been permanently moved to a new URL, and the client should use the new URL to re-request.
  • 302 Found: The requested resource has been temporarily moved to a new URL, and the client should continue to use the original URL.
  • 304 Not Modified: The resource requested by the client through the condition has not been modified, and the cached version can be used directly.

4xx (Client Error): Indicates that an error occurred on the client side and the request could not be completed.

  • 400 Bad Request: The request is wrong and the server cannot understand the request.
  • 401 Unauthorized: The request requires authentication, and the client needs to provide valid authentication information.
  • 403 Forbidden: The server understands the request, but refuses to execute it.
  • 404 Not Found: The requested resource does not exist, and the server did not find the requested URL.

5xx (Server Error): Indicates that an error occurred on the server and the request could not be completed.

  • 500 Internal Server Error: The server has an internal error and cannot complete the request.
  • 502 Bad Gateway: The server, acting as a gateway or proxy, received an invalid response from the upstream server.
  • 503 Service Unavailable: The server is temporarily unavailable, usually due to overload or maintenance.

4. Request method

  • GET: Used to request to obtain a specified resource.
  • POST: used to submit data to the server, such as submitting form data.
  • PUT: Used to update the content of the specified resource.
  • DELETE: used to delete the specified resource.
  • HEAD: Similar to a GET request, but only obtains the response header information, not the response body.
  • OPTIONS: The communication options used to obtain the support of the target resource.
  • PATCH: Used to perform local updates on resources.

5. Http version

The HTTP (Hypertext Transfer Protocol) protocol has undergone several versions of evolution and improvement. At present, there are mainly the following major versions:

  1. HTTP/0.9: The earliest version of HTTP, released in 1991. It is a very simple protocol that only supports the transmission of HTML text, no request and response headers, and no status codes. It is mainly used to transmit hypertext documents (HTML) and hyperlinks (Hyperlink).
  2. HTTP/1.0: Released in 1996, compared to HTTP/0.9, HTTP/1.0 introduced the concept of request header and response header, allowing the transmission of various types of resources, such as pictures, style sheets, and script files. The characteristic of HTTP/1.0 is that each request needs to establish a connection separately, resulting in low efficiency.
  3. HTTP/1.1: Released in 1997, it is an important version of the HTTP protocol. HTTP/1.1 introduced features such as persistent connections, pipelined requests, and the Host header field, which significantly improved web page loading speed and user experience. Persistent connections allow multiple requests and responses to be sent on the same connection, avoiding the overhead of re-establishing the connection for each request.
  4. HTTP/2: Released in 2015, it is the latest major version of the HTTP protocol. HTTP/2 further optimizes performance, security, and parallel processing capabilities. It introduces a binary protocol that allows multiple requests and responses to be transmitted on the same connection at the same time, eliminating the problem of request blocking. HTTP/2 also supports features such as header compression and server push, further reducing the overhead of data transmission.
  5. HTTP/3: The latest version, standardized as RFC9114 on June 6, 2022. It will abandon the use of TCP and use QUIC on UDP to carry application layer data.

6、Cookie

HTTP Cookie (Cookie for short) is a mechanism in the HTTP protocol, which is used to transfer session information and state data between the client (usually a web browser) and the server. Cookies are mainly used to record some state information of the user so that the server can identify the user or maintain the user's session state in subsequent requests.

working principle:

1. The server sets the Cookie: The server sets the Cookie information in the Set-Cookie header in the HTTP response, and sends the Cookie to the client. For example:

Set-Cookie: session_id=1234567890; path=/; domain=example.com; expires=Sun, 20-Jul-2025 12:00:00 GMT; secure; HttpOnly

2. The client saves the cookie: After the client (web browser) receives the cookie set by the server, it will save the cookie locally. In subsequent requests, the client will carry the Cookie information in the Cookie field of the request header.

3. The client sends Cookie: When the client sends an HTTP request, it will include the stored Cookie information in the Cookie field of the request header and send it to the server.

4. The server reads the cookie: After the server receives the request, it can read the cookie information sent by the client from the cookie field of the request header, and identify the user or maintain the session state according to the data therein.

Cookie properties:

A cookie can contain several attributes that specify its behavior and expiration date. Common cookie attributes include:

  • Name and Value: Indicates the name of the cookie and the corresponding value.
  • Domain: Specifies the scope of the cookie, specifying which domain names can access the cookie.
  • Path: Specifies the role path of the cookie, and specifies which URL paths can access the cookie.
  • Expires and Max-Age: Specify the validity period of the cookie, Expires specifies a specific expiration time, and Max-Age specifies the number of seconds from the current time.
  • Secure: Indicates that the cookie can only be transmitted through the HTTPS connection, which is used to ensure the security of the cookie.
  • HttpOnly: Indicates that the cookie is limited to HTTP or HTTPS transmission, and cannot be accessed by scripts such as JavaScript, which is used to prevent XSS attacks.

use:

Cookies serve many purposes in web applications, some of the common ones include:

  • Record the user's login status to implement user authentication and session management.
  • Store user personalization settings and preferences.
  • Track the user's browsing behavior to analyze user behavior and recommend relevant content.
  • In a shopping website, it is used to save shopping cart information and transaction status.

7. URL length

URL length limits are set by browsers, not by the HTTP protocol. The following are some browsers' restrictions on the length of urls in http.

Of course, everyone will say that I am engaged in development. Do I not need the length limit of the browser? For example, with apache-httpclient, I tried httpclient, which also has a length limit. If it is too long, it will return a 400 error.


If the article is helpful to you, welcome to pay attention + like it, and you must return to close!

Guess you like

Origin blog.csdn.net/citywu123/article/details/131978919