4D explains the HTTP protocol in detail, web development will no longer be confused

I. Overview

HTTP (Hypertext Transfer Protocol) is an application layer protocol for transferring hypertext data between clients and servers. It is one of the most important protocols in the modern Internet and is often used for communication between browsers and web servers.
HTTP allows clients to initiate requests and receive server responses, and its main goal is to enable communication and data exchange between clients and servers. HTTP is a stateless protocol, each independent request-response cycle is independent of each other, and the server does not retain any information from previous requests.

2. Basic concepts

request-response model

The Request-Response Model (Request-Response Model) is a basic pattern in HTTP communication, which is used to describe the interaction process between the client and the server. In this model, the client sends an HTTP request to the server, the server receives and processes the request, and returns an HTTP response to the client.

  1. Client sends request: The client (usually a browser) sends an HTTP request to the server. A request consists of the following parts:

    • Request Method (Request Method): defines the operation that the client wants the server to perform, such as GET, POST, etc.
    • Request Target: Specifies the requested URI (Uniform Resource Identifier), which can be an absolute path or a relative path.
    • Protocol Version: Specifies the version of the HTTP protocol used, such as HTTP/1.1.
    • Header field (Headers): Contains a series of fields in the form of key-value pairs, which are used to pass additional request information.
    • Message Body: optional, used to carry some additional data, such as form data in a POST request.
  2. The server receives the request: After the server receives the HTTP request sent by the client, it starts to process the request. The server will parse the request message and extract the request method, target, header field and message body.

  3. Server processing request: The server performs corresponding operations according to the method and target of the request. For example, if it is a GET request, the server may return the requested resource; if it is a POST request, the server may process the submitted form data, etc.

  4. The server generates a response: After the server processes the request, it generates an HTTP response for replying to the client. The response consists of the following parts:

    • Status Line: Describes the basic information of the response, including the protocol version, status code, and status text.
    • Header field (Headers): Contains a series of fields in the form of key-value pairs, which are used to pass additional response information.
    • Message Body: optional, used to carry the specific data of the response, such as document content, pictures, etc.
  5. Server sends response: The server sends the generated HTTP response to the client. The response is transmitted over the network to the client.

  6. The client receives the response: After the client receives the HTTP response sent by the server, it starts to parse the response message. The client extracts content such as the status line, header fields, and message body.

  7. Client processing response: The client performs corresponding processing according to the status code and header fields of the response. For example, display the content of the response, and parse header fields for additional information.

  8. Request-response completion: After a request-response cycle is completed, the communication between the client and server ends. The client can choose to continue sending new requests for further interaction with the server.

The request-response model is the basic pattern of HTTP communication and is often used in web development. Through this model, the client can request the server to obtain resources or perform specific operations, and the server will process the request and return the corresponding result to the client. The advantage of this model is that it is flexible, simple, and easy to expand and implement.

What is URI, URL, URN

  1. URI (Uniform Resource Identifier, Uniform Resource Identifier): A URI is a string identifier used to uniquely identify and locate a resource. It can represent any type of resource, including documents, images, videos, APIs, etc. URI consists of two parts, URL and URN.

  2. URL (Uniform Resource Locator, Uniform Resource Locator): URL is a concrete implementation of URI, which is used to locate and access resources on the Internet. It contains the following parts:

    • Protocol: Specifies the protocol used to access resources, such as HTTP, HTTPS, FTP, etc.
    • Host name (Host): Specifies the host name or domain name where the resource is located to locate the server.
    • Port number (Port): Optional, specifies the specific port number used by the resource.
    • Path: Specifies the path or location of the resource on the server.
    • Query Parameters (Query Parameters): Optional, used to pass additional parameters to the server.
    • Anchor (Fragment): Optional, specifies a specific location or fragment in the resource.

    For example, http://www.example.com/index.html is a URL.

  3. URN (Uniform Resource Name): URN is another form of URI, which is used to assign a persistent and unique name to a resource. A URN can be independent of where a resource is located or how it is accessed. The format of the URN is fixed and prefixed with urn: followed by the namespace and resource identifier.

    For example, urn:isbn:9781234567890 indicates a resource that uses an ISBN as a namespace.

Differences and uses:

  • URI is a general concept, which is used to uniquely identify and locate resources.
  • URL is a specific implementation of URI, which not only identifies and locates resources, but also specifies how to access resources.
  • URN is another form of URI, which is used to assign a unique name to a resource, regardless of the location and access method of the resource.

HTTP message structure

HTTP message is the basic unit of data transmission in HTTP communication, it can be divided into two types: request message and response message.

  1. HTTP request message structure:

    • Request Line: Contains the request method, request target (URI) and protocol version.
    • Header field (Headers): Contains a series of fields in the form of key-value pairs, which are used to pass additional request information.
    • Blank Line: used to separate header fields and message body.
    • Message Body: optional, used to carry some additional data, such as form data in a POST request.
  2. HTTP response message structure:

    • Status Line: Contains the protocol version, status code and status text.
    • Header field (Headers): Contains a series of fields in the form of key-value pairs, which are used to pass additional response information.
    • Blank Line: used to separate header fields and message body.
    • Message Body: optional, used to carry the specific data of the response, such as document content, pictures, etc.

The specific format of the request line and status line is as follows:

  • Request line format:Method Request-URI HTTP-Version
  • Status line format:HTTP-Version Status-Code Reason-Phrase

The header field consists of multiple key-value pairs, each field is represented by a key and a value, for example:

Content-Type: application/json
Content-Length: 1024

The structure and components of an HTTP message have the following characteristics and uses:

  • Request line or status line: contains the HTTP method (GET, POST, etc.), request target (URI) and protocol version (HTTP/1.1, etc.), used to describe the basic information of the request or response.
  • Header field: Used to pass various types of metadata information, such as content type, length, cache control, etc. Header fields can be customized according to specific needs, and some common fields are widely used.
  • Empty line: as a separator between the header field and the message body, used to mark the end of the header field.
  • Message body: optional, used to carry the specific data of the request or response. In a GET request, the message body is usually empty; in a POST request, the message body may contain form data, JSON data, etc.

The structure and components of the HTTP message define the specification of the HTTP communication, enabling the client and the server to perform effective data exchange and resource transmission. By parsing and processing HTTP messages, the client and server can understand each other and process requests and responses correctly.

related examples
  1. Example of HTTP request message:
GET /index.html HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
Accept: text/html,application/xhtml+xml

The above example represents a GET request for index.htmla file located on the server. The request contains header fields such as Host, User-Agent, and Accept.

  1. Example of an HTTP response message:
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1274

<!DOCTYPE html>
<html>
<head>
    <title>Example Domain</title>
</head>
<body>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents.</p>
</body>
</html>

The above example is an HTTP response message, and the status line indicates the protocol version and status code used in the response (200 means success). The response contains header fields such as Content-Type and Content-Length, and the message body contains HTML content.

3. HTTP request method

  1. GET request method:

    • Features: The GET method is used to obtain a representation of a resource from the server. It is a safe and idempotent method and should not modify the server state. Parameters for GET requests are usually appended to the URL's query string.
    • Purpose: Used to get the content of a specific resource. Common usages include fetching web pages, images, API data, etc. through URLs.
    • Relevant header fields:
      • Accept: Specifies the response content types acceptable to the client.
      • If-None-Match: Specify an ETag value. If it matches the ETag of the resource on the server, it means that the client already has the latest copy of the resource, and the server can return 304 Not Modified.
      • If-Modified-Since: Specify a date and time. If the resource has not changed after the specified date and time, the server can return 304 Not Modified.
  2. POST request method:

    • Features: The POST method is used to submit data to the server and ask the server to process it according to the request. It may cause a change in the state of the server.
    • Purpose: Used to send data to the server, such as submitting forms, uploading files, etc.
    • Relevant header fields:
      • Content-Type: Specifies the type of data sent. Common types are application/x-www-form-urlencoded (default form data type) and multipart/form-data (for file upload).
      • Content-Length: Specifies the size of the request body.
  3. PUT request method:

    • Features: The PUT method is used to store the representation in the request at a specified location on the server. If a resource already exists at the location, it is replaced; if not, a new resource is created.
    • Purpose: Used to update or create a resource, usually an entire resource replacement.
    • Relevant header fields:
      • Content-Type: Specifies the type of data sent.
      • Content-Length: Specifies the size of the request body.
  4. DELETE request method:

    • Features: The DELETE method is used to request the server to delete the resource at the specified location.
    • Purpose: Used to delete resources on the server.
    • Related header fields: No specific header fields.
  5. PATCH request method:

    • Features: The PATCH method is used to partially update the resources on the server, and only part of the content of the resources is modified.
    • Purpose: It is used to update a part of the content of the resource, not to replace the entire resource.
    • Relevant header fields:
      • Content-Type: Specifies the type of data sent.
      • Content-Length: Specifies the size of the request body.
  6. HEAD request method:

    • Features: The HEAD method is used to obtain the response header information of the resource without obtaining the actual content.
    • Purpose: Used to check the status and metadata of resources, such as last modification time, content length, etc.
    • Related header fields: No specific header fields.
  7. OPTIONS request method:

    • Features: The OPTIONS method is used to request to the server to return all the HTTP request methods it supports.
    • Purpose: Used for negotiation and decision-making between the client and the server.
    • Related header fields: No specific header fields.

4. HTTP status code

HTTP status codes are standardized three-digit codes used to indicate various conditions that occur during HTTP communications. They serve as the server's response to a client request, providing information about the outcome of the processing of the request. HTTP status codes consist of five different categories, each with a different meaning and purpose.

1xx (informational status code): Indicates that the request was received and is being processed. These status codes are temporary and instruct the client to continue sending the request or the server to switch the transfer process to the next stage.

  • 100 Continue: Continue. The client should continue sending requests. This provisional response indicates that the server has understood the client's request, but the server still needs further request information to complete the request.
  • 101 Switching Protocols: switching protocol. The server has understood and accepted the client's request, and will switch to the new protocol for communication.

2xx (success status code): Indicates that the server successfully received, understood, and successfully processed the client's request.

  • 200 OK: The request was successful. The server has successfully processed the request.
  • 201 Created: Created. The request succeeded and a new resource was created on the server.
  • 204 No Content: No content. The server successfully processed the request, but returned nothing.
  • 206 Partial Content: partial content. The server successfully processed a partial GET request.

3xx (redirection status code): Indicates that the client needs to take additional action to complete the request. These status codes are used to inform the client that the resource requested has been moved to another location, or that a different URI needs to be used to access the resource.

  • 301 Moved Permanently: Permanent redirection. The requested resource has been permanently moved to a new URL.
  • 302 Found: Temporary redirection. The requested resource was temporarily moved to a new URL.
  • 304 Not Modified: Not modified. The client has a cached copy and the requested resource has not been modified.
  • 307 Temporary Redirect: Temporary redirection. The requested resource was temporarily moved to a new URL.
  • 308 Permanent Redirect: Permanent redirection. The requested resource has been permanently moved to a new URL.

4xx (client error status code): Indicates that the request sent by the client has an error, and the server cannot process or refuses to respond. These status codes indicate that the client needs to take action to correct the error.

  • 400 Bad Request: Bad request. The server could not understand the request sent by the client.
  • 401 Unauthorized: Unauthorized. The request requires user authentication.
  • 403 Forbidden: Forbidden access. The server is refusing to fulfill the request.
  • 404 Not Found: Not found. The server cannot find the requested resource.
  • 405 Method Not Allowed: The method is not allowed. A method not allowed by the server was used in the request.
  • 408 Request Timeout: The request timed out. The server waited too long and the client did not send the request.
  • 413 Payload Too Large: The request entity is too large. The request body or uploaded file exceeds the server limit.
  • 415 Unsupported Media Type: Unsupported media type. The server is refusing to service the requested format.

5xx (server error status code): Indicates that the server encountered an error while processing the request and could not complete the request. This type of status code indicates that the client cannot solve the problem and needs to be repaired by the server.

  • 500 Internal Server Error: Internal server error. The server encountered an unexpected condition and was unable to complete the request.
  • 502 Bad Gateway: Wrong gateway. The server acting as a proxy or gateway received an invalid response from an upstream server.
  • 503 Service Unavailable: The service is unavailable. The server is temporarily unable to handle the request, usually due to overload or maintenance.
  • 504 Gateway Timeout: Gateway timeout. The server acting as a proxy or gateway timed out waiting for a response from the upstream server.

Five, HTTP header field

Commonly used request header fields

  1. Accept: Specifies the media type of the response content that the client can accept. For example, "Accept: text/html, application/json" indicates that the client can accept HTML text and JSON data formats.

  2. Accept-Language: Specifies the natural language preferred by the client. Servers can use this field to return response content appropriate to the client's language settings. For example, "Accept-Language: en-US, zh-CN" indicates that the client prefers English (United States) and Chinese (China).

  3. Authorization: Used to send credentials to the server when authenticating. Typically used with Bearer tokens (Token) or Basic Authentication (Basic Authentication).

  4. Content-Type: Specifies the media type in the request body. It tells the server the format of the data sent in the request so that the server can parse the request body correctly. Common values ​​are "application/json", "application/x-www-form-urlencoded", etc.

  5. User-Agent: Contains information about the user agent (browser, application, etc.) that initiated the request. This field can be used by the server to understand the origin device and platform of the request.

  6. Cookie: Send the cookie information set by the server before. Servers can use this field for state tracking and authentication of users.

  7. Referer: Indicates the URL of the source of the current request. The server can judge the context of the request according to the Referer field and process it accordingly.

  8. If-Modified-Since: For conditional requests. The client can send the Last-Modified header field value in the last received response to determine whether the resource has been modified since the last request, so as to determine whether it needs to be reacquired.

  9. Cache-Control: Specifies the behavior of the caching mechanism. For example, "Cache-Control: no-cache" indicates that clients and intermediate proxies should not cache responses to this request.

  10. Content-Length: Specifies the length of the request body in bytes. This is useful for server processing of request bodies.

Commonly used response header fields

  1. Content-Type: Specifies the media type of the response. It tells the client the format of the response content so that the client can parse and process the response body correctly. Common values ​​are "text/html", "application/json", etc.

  2. Content-Length: Specifies the length of the response body in bytes. Clients can use this field to determine the full response body size.

  3. Cache-Control: Specifies the behavior of the caching mechanism. For example, "Cache-Control: no-cache" indicates that clients and intermediate proxies should not cache the response.

  4. Expires: Specifies the date and time when the response expires. The client can judge the validity of the response based on this field to prevent invalid responses from being reused.

  5. Last-Modified: Indicates when the resource was last modified. The client can compare this value with the If-Modified-Since header field in the next request to determine whether the resource has been modified.

  6. ETag: Entity tag of the specified resource. The client can compare this value with the If-None-Match header field in the next request to determine whether the resource has been modified.

  7. Location: Used to redirect responses. The server MAY send this field to inform the client of a new location to visit.

  8. Set-Cookie: Used to set cookies in the response. The server can use this field to send authentication information or other state information to the client.

  9. Access-Control-Allow-Origin: A response header field for Cross-Origin Resource Sharing (CORS). The server uses this value to specify the origin domain names that are allowed to access this resource.

  10. X-Powered-By: Indicates the technology or framework used by the server. Although this field is not a standard HTTP header field, it is often used to disclose server information.

6. HTTP persistent connection and pipeline

persistent connection

Persistent Connection (Persistent Connection), also known as HTTP Keep-Alive, is a technique for sending multiple HTTP requests and responses on a single TCP connection. In traditional HTTP/1.0, each HTTP request needs to establish a new TCP connection, which will bring high overhead of establishing and closing connections during frequent requests. Persistent connections reduce this overhead by sending multiple requests and responses over a single TCP connection.

The following is the rationale and workflow of persistent connections:

  1. Establish a connection: The client establishes a connection with the server using a TCP three-way handshake.

  2. Send request: The client sends an HTTP request to the server.

  3. Receive response: After receiving the request, the server generates and sends the corresponding HTTP response to the client.

  4. Keep connected: In a persistent connection, the server does not close the connection after sending the response, but keeps the connection open.

  5. Continue request: The client can use the established connection to send the next HTTP request.

  6. Receive response: After the server receives the next request, it generates and sends the corresponding HTTP response.

  7. Repeat steps 5 and 6: the client and server can alternately send requests and responses to complete subsequent communication.

  8. Close the connection: When the client or server decides to close the connection, it will send a special flag or close the connection through four waves of TCP.

The benefits of persistent connections include:

  1. Reduce connection establishment overhead: Avoid the overhead of establishing and closing TCP connections for each request, improving performance and efficiency.

  2. Reduced network latency: Sending multiple requests and responses on the same TCP connection reduces network round trip time and speeds up data transfers.

  3. Saving bandwidth resources: By reducing the number of connection establishment and closing, the protocol overhead is reduced, thereby saving bandwidth resources.

It should be noted that persistent connections are not kept indefinitely. According to the regulations in the HTTP standard, the server can set the maximum number of requests or the timeout period of the persistent connection to limit the duration of the connection, so as to avoid resource waste and excessive use. In addition, persistent connections are only applicable to HTTP/1.1 and above, and are not supported in HTTP/1.0.

To sum up, persistent connections send multiple HTTP requests and responses on a single TCP connection, reducing the overhead of connection establishment and closing, improving performance and efficiency, and is a technology for optimizing HTTP communication.

Pipelining

Pipelining is an optimization technology introduced in HTTP/1.1, which aims to reduce communication delays and improve network performance by sending multiple requests simultaneously on a persistent connection. It allows a client to send multiple requests without waiting for a response for each request before waiting for the server to respond.

The following is the rationale and workflow of pipelined:

  1. Establish a connection: The client establishes a connection with the server using a TCP three-way handshake.

  2. Sending requests: The client sends multiple HTTP requests to the server in sequence without waiting for the response of each request.

  3. Receive response: After receiving the request, the server generates and sends the corresponding HTTP response in the order of the request.

  4. Return response: The client receives the corresponding HTTP responses in the order requested.

The benefits of pipelining include:

  1. Reduced communication latency: In traditional non-pipelined situations, each request needs to wait for the response of the previous request before sending the next request, resulting in high communication latency. Through pipeline, multiple requests can be sent at the same time, thereby reducing waiting time and communication delay.

  2. Improve network performance: Since multiple requests are sent on a persistent connection at the same time, network bandwidth resources and server processing capabilities are effectively used, and network performance is improved.

It should be noted that although pipelining can significantly reduce communication latency, it also has some limitations and considerations:

  1. Not applicable in all cases: Server support for pipelining is optional, and not all servers fully support pipelining. Some servers may choose not to handle pipelined requests or only partially support them, so compatibility needs to be considered in practical applications.

  2. Response order is consistent with request order: Since the HTTP protocol requires the server to generate responses in the order in which requests are received, the client must receive responses in order to ensure that the responses match the corresponding requests.

  3. Susceptible to blocking: If one of the requests encounters blocking during pipelined process, the entire pipelined process may be affected, causing delays for subsequent requests.

To sum up, pipelining is a technique for reducing communication delay and improving network performance by sending multiple requests simultaneously on a persistent connection. However, due to server support and some potential issues, we need to use pipelining with caution and conduct compatibility tests and performance evaluations in real applications.

7. Security and Authentication

HTTP security issues

  1. Confidentiality of transmitted data: In HTTP, data is transmitted in clear text, which is easy to be eavesdropped and tampered with. For example, when users send sensitive information (such as usernames, passwords, or credit card numbers) over HTTP, attackers can intercept and obtain this information. To solve this problem, encryption protocols such as HTTPS can be used to protect the confidentiality of communication.

  2. Integrity: In the HTTP communication process, data integrity is also an important issue. Attackers can tamper with HTTP requests or responses, causing information to be modified or destroyed. In order to ensure the integrity of the data, a message digest algorithm such as MD5 or SHA can be used to generate a digest of the request or response, and check it during the communication process.

  3. Cross-site scripting (XSS): An XSS attack is when an attacker inserts malicious script code into a web page, and when other users visit the page, the code will be executed on the victim's browser. Attackers can use XSS vulnerabilities to steal sensitive information of users, use user identities to send malicious requests, and so on. Methods to prevent XSS attacks include filtering and escaping user input reasonably, and using the Content Security Policy (CSP) of the HTTP header to restrict the source of executable scripts.

  4. Cross-Site Request Forgery (CSRF): A CSRF attack is when an attacker takes advantage of an authenticated user to perform malicious actions without their knowledge. Attackers trigger the attack by convincing the victim to visit a specific webpage or click on a malicious link. To prevent CSRF attacks, a randomly generated token (CSRF Token) can be used for verification to ensure the legitimacy of the request source.

  5. Clickjacking: Clickjacking means that an attacker overlays a transparent, malicious page on a seemingly harmless page. When a user clicks on an element on the page, it actually triggers a hidden malicious operation. In order to prevent clickjacking, you can use the X-Frame-Options of the HTTP header to set whether the page is allowed to be embedded in an iframe.

  6. HTTP hijacking: HTTP hijacking refers to the attacker controlling a node in the network, intercepting and tampering with network traffic. Attackers can modify HTTP requests or responses, perform malicious actions or steal information. In order to prevent HTTP hijacking, HTTPS can be used to encrypt communication, and digital certificates can be used to verify the identity of the server.

HTTPS

HTTPS (Hypertext Transfer Protocol Secure) is an encrypted communication protocol based on TLS/SSL (Transport Layer Security/Secure Sockets Layer) protocol, which is used to ensure the security of network communication and protect the confidentiality, integrity and authentication of data.

The main difference between HTTPS and HTTP is that an encryption mechanism is used during data transmission, so it is more secure.

  1. Encrypted handshake: When the client establishes a connection with the server, the TLS handshake process is performed first. The client sends a list of cipher suites, including supported encryption algorithms and key exchange methods. The server chooses a cipher suite from this list and sends the digital certificate to the client.

  2. Digital certificate verification: After the client receives the server's digital certificate, it will verify the legality and validity of the certificate. This includes checking the trustworthiness of the certificate authority, whether the certificate has expired, whether the server domain name matches the certificate, etc.

  3. Key exchange: If the digital certificate verification is successful, the client generates a key for symmetric encryption, encrypts the key with the server's public key, and sends it to the server. The server decrypts this key with its own private key.

  4. Encrypted communication: After an encrypted connection is established, the communication between the client and the server is encrypted using a symmetric encryption algorithm to ensure the confidentiality and integrity of data during transmission.

The security brought by HTTPS is mainly reflected in the following aspects:

  1. Confidentiality: HTTPS uses a symmetric encryption algorithm to encrypt the transmitted data, preventing communication content from being eavesdropped and stealing sensitive information.

  2. Integrity: HTTPS uses message digest algorithms or cryptographic hash functions to verify data to ensure that data has not been tampered with or damaged during transmission.

  3. Authentication: Through digital certificates, HTTPS can verify the identity of the server. The client can confirm whether the currently connected server is legal and credible, and avoid man-in-the-middle attacks.

  4. Compatibility: The HTTPS protocol is a superset of the HTTP protocol, so it has good compatibility with existing HTTP applications.

To sum up, HTTPS provides a higher level of network communication security through mechanisms such as encrypted communication, digital certificate verification, and key exchange. It can protect the confidentiality, integrity and authentication of data, making users more secure when conducting online transactions, logging into personal accounts and transmitting sensitive information. Adopting the HTTPS protocol is an important security measure when developing and deploying websites.

HTTP authentication

HTTP Authentication (HTTP Authentication) is a standard mechanism for authenticating between clients and servers. It allows servers to restrict access to specific resources and require users to provide valid credentials to verify their identity.

HTTP authentication works as follows:

  1. The client requests a protected resource: When the client sends an HTTP request to access a resource that requires authentication, the server returns a status code of 401 Unauthorized, indicating that authentication is required.

  2. The server requires authentication: The server includes an authentication scheme, such as Basic Authentication or Digest Authentication, in the WWW-Authenticate field of the response header.

  3. The client provides credentials: After the client receives the 401 response from the server, a dialog box will pop up asking the user to enter the user name and password. Alternatively, the client can add the credentials in an encoded format to the Authorization field of the request header for automated authentication.

  4. The server verifies the credentials: After receiving the request carrying the credentials, the server parses the credentials and compares them with the stored user credentials. If the credentials match, the server returns a status code of 200 OK, indicating that the authorization was successful and the client can access the protected resource.

The common HTTP authentication schemes are as follows:

  1. Basic Authentication: This scheme is the simplest HTTP authentication method. In the Authorization field of the request header, use Base64 encoding to send the username and password to the server. It is less secure because the information is transmitted in clear text and no encryption is provided.

  2. Digest Authentication: Digest authentication uses a digest algorithm to encrypt user names, passwords, and other related information, improving security. There are multiple handshakes between client and server to generate a session key to ensure the confidentiality of credentials.

  3. Bearer Token Authentication: This scheme authenticates identity by issuing tokens (Token). The client carries a valid token in the Authorization field of the request header, and the server verifies the legitimacy of the token and authorizes access to protected resources. This method is suitable for stateless applications and is often used for OAuth authentication.

HTTP authentication is a simple and effective authentication mechanism that can be used to protect access to sensitive resources. However, basic authentication and digest authentication have some limitations in security, so in practical applications, it is recommended to use HTTPS in combination to provide a more secure communication environment, and consider using stronger authentication mechanisms such as OAuth or OpenID Connect.

Eight, Cookie and Session

Cookie

A cookie is a technology for state management between a client (usually a web browser) and a server. It does so by transmitting data in HTTP response and request headers.

How cookies work:

  1. The server creates a cookie: When a user visits a website through a browser, the server can add a Set-Cookie header field to the HTTP response, and encode the data to be stored and other relevant information into a cookie.

  2. The browser receives the cookie: After the browser receives the response from the server, it will save the cookie in the cookie storage space of the client. Each browser has its own cookie storage mechanism.

  3. The browser sends Cookie: When the user visits the website again or visits other pages of the website, the browser will carry the corresponding Cookie data in the Cookie header of the HTTP request and send it to the server.

  4. Server processing Cookie: After the server receives the HTTP request containing the Cookie, it will parse the Cookie and use the data in it. The server can judge the identity of the user, store the user's preference settings, and implement functions such as user tracking based on the information in the cookie.

Cookies have several important properties:

  1. Name (name): the name of the cookie, used to identify and distinguish different cookies.

  2. Value (value): The data value associated with the cookie.

  3. Domain (domain): Specifies the domain name that can access the cookie. Only requests matching this domain name will carry the corresponding cookie.

  4. Path (path): Specify the path that can access the cookie. Cookies will only be sent to the server for requests that match this path.

  5. Expiration (expiration time): Specify the validity period of the cookie. Can be set to a specific date and time, or set as a session cookie to be deleted when the browser is closed.

  6. Secure (safe flag): When set to true, it means that the cookie will only be sent when it is transmitted through the HTTPS protocol.

  7. HttpOnly (restrict script access): When set to true, it means that the cookie is not allowed to be accessed by scripts (such as JavaScript), reducing the risk of cross-site scripting attacks (XSS).

The main functions of cookies are as follows:

  1. Identity recognition and state management: By storing user identity information and session state in cookies, websites can realize user authentication and maintain user login status.

  2. Personalized settings and preferences: By storing user preferences, the website can provide a personalized user experience, such as language selection, theme style, etc.

  3. Tracking and statistical analysis: By tracking user behavior and click streams in cookies, websites can conduct data analysis, understand users' visiting habits, and improve website performance and user experience.

Session

Session (session) is a mechanism for state management on the server side, used to track user activities on the website and store related data. Unlike Cookie, Session data is stored on the server, not the client (browser).

Session works:

  1. Server creates Session: When a user visits a website that uses Session, the server will create a unique Session identifier for the user, usually a string (Session ID). The server will send the Session ID to the client, and set an HTTP header named "Set-Cookie" in the response to store the Session ID in the cookie.

  2. The client sends the Session ID: the client will carry the Session ID in the Cookie header and send it to the server in each subsequent request.

  3. The server processes the Session: After receiving the request containing the Session ID, the server finds the corresponding Session data through the Session ID. The server can identify the user according to the session data, and store the user's state information and temporary data.

  4. Storage and management of session data: Session data is usually stored in the server's memory or in a database. The server allocates a block of memory or a database entry for each Session to store data related to the user.

  5. Session expiration and destruction: In order to avoid excessive server load, Session needs to set an expiration time. Once the session exceeds the specified expiration time, or the user closes the browser, the server will mark it as expired and destroy it after a period of time. At this point, a new Session will be recreated the next time the user visits the website.

Session has the following characteristics:

  1. Security: Compared with Cookie, Session does not store sensitive data on the client side, only Session ID is saved in Cookie, which reduces the risk of being stolen by attackers.

  2. Scalability: Session data is stored on the server side, which can store more data and complex structures. At the same time, the server can expand storage capacity as needed.

  3. Automatic management: Session creation, storage and destruction are often handled automatically by the server without our manual management.

  4. Cross-platform: Session has nothing to do with a specific client and can be used for cross-platform Web development.

It should be noted that the use of Session requires server support and corresponding configuration. We need to write code on the server side to manage Session, including creating, storing and destroying Session, and reading and updating data in Session.

Nine, caching and control

caching mechanism

The HTTP caching mechanism is a way to improve performance, reduce network traffic, and reduce server load by storing and reusing response data between the client (browser) and server. HTTP caching works on multiple levels: client (browser) side caching and intermediate proxy server side caching.

How HTTP caching works:

  1. Client request: When the client initiates an HTTP request, the request header will contain some identifiers, such as "If-Modified-Since" and "Cache-Control". These flags tell the server whether the cache can be used to fulfill the request.

  2. Server response: When the server responds to the request, it will add some cache-related identifiers and information to the response header. These flags tell the client how to cache the response and how long the response is valid.

  3. Client cache: After receiving the response, the client stores the response data in the local cache according to the cache identifier and information in the response header. Next time if there is the same request, the client can directly use the data in the cache without sending a request to the server again.

  4. Validate cache: When a client sends a request with a cache, the server can perform a validation operation, checking that the data in the cache is still valid. If the data is valid, the server returns a special response code (such as 304 Not Modified), telling the client that the data in the cache can be used; if the data is invalid, the server returns new response data.

HTTP caching has various policies and flags to control caching behavior:

  1. Cache-Control: By setting this response header field, the server can control the behavior of caching data. For example, "public" means that the response can be cached arbitrarily, "private" means that the response can only be stored in a single user's private cache, "no-cache" means that the cache needs to be authenticated with the server first, etc.

  2. ETag: The server can add a unique identifier (ETag) in the response header to indicate the version of the response data. The client compares this identifier to a previously cached version to determine if the data is valid.

  3. Last-Modified / If-Modified-Since: The server can add the last modification time (Last-Modified) in the response header, and the client can send the last modification time through the "If-Modified-Since" header field in the next request To the server, the server judges the validity of the data according to the last modification time.

  4. Max-Age: The server can set the maximum validity period (Max-Age) of the response, which is the longest retention time of data in the client cache.

The HTTP caching mechanism brings the following advantages:

  1. Reduce network traffic: By reusing cached response data, the client can avoid sending the same request to the server, reducing network traffic and improving performance.

  2. Improve the response speed: the client can directly obtain the response data from the local cache without waiting for the server's response, which speeds up the page loading speed.

  3. Reduce server load: Due to the reuse of cached response data, the server can send fewer responses, reducing the load pressure on the server.

How to Control Caching

The HTTP protocol provides a series of mechanisms and header fields for controlling the behavior of caches.

  1. Cache-Control: Cache-Control is the most commonly used header field used to control cache behavior. It specifies various behaviors of the cache, and multiple directives can be combined for finer control. Common commands are:

    • public: Indicates that the response can be stored by any cache, including client browsers and intermediate proxy servers.
    • private: Indicates that the response can only be stored in a single user's private cache and cannot be stored in a public cache.
    • no-cache: Indicates that the cache needs to be verified with the server first to confirm the freshness, and the cached data cannot be used directly.
    • no-store: Indicates that the response data is not allowed to be cached, and the data needs to be retrieved for each request.
    • max-age: Specifies the maximum validity period of the cache, in seconds.
  2. Expires: The Expires header field specifies a date and time indicating the expiration time of the response. Response data is available to caches until the expiration time. Expires is the way of HTTP/1.0. Its disadvantage is that the server and client time are required to be consistent.

  3. ETag: ETag is a unique identifier generated by the server to identify the version of the response data. When the cached resource changes, the server can notify the client to update the cache by modifying the value of ETag.

  4. Last-Modified / If-Modified-Since: Last-Modified is a timestamp set by the server, indicating the last modification time of the resource. The client can send this timestamp to the server through the If-Modified-Since header field in the next request, and the server can judge whether the resource has changed based on this timestamp.

These header fields can be used in combination for finer-grained cache control. For example, you can use the max-age command of Cache-Control to set the maximum validity period of the cache, and at the same time cooperate with the ETag and If-None-Match header fields to perform cache verification to ensure that the cache is updated in time.

It should be noted that the behavior of the cache depends not only on the header fields returned by the server, but also on the configuration of the client (browser) and intermediate proxy servers. The client and the proxy server can decide whether to cache the response and the caching strategy according to the instruction of the header field.

10. RESTful Architecture

REST (Representational State Transfer) is an architectural style for designing and building web-based applications. It is a lightweight, flexible and extensible architecture widely used for building web services and APIs. RESTful architecture follows a set of principles and constraints to achieve scalability, reliability and maintainability of the system.

The main features and principles of RESTful architecture:

  1. Resources (Resources): REST abstracts data and functions into the concept of resources. Each resource has a unique identifier (URI), and the client can operate on the resource through HTTP methods (such as GET, POST, PUT, DELETE). A resource can be an entity object, a database record, or any other meaningful data.

  2. State Transfer: The RESTful architecture emphasizes the state transfer between the client and the server. The client changes the state of the resource by sending a request to the server, and the server returns the result in response. The server does not store any state information about the client, and each request is stateless.

  3. Uniform Interface: The RESTful architecture provides a unified interface that enables clients to interact with different resources in the same way. These interfaces include: URIs that identify resources, use HTTP methods to operate on resources, use media types to describe the format of requests and responses, and use hyperlinks to represent relationships.

  4. No hierarchical constraints (Layered System): REST allows intermediary proxy servers to be inserted between clients and servers to provide caching, load balancing, security, and other functions. The client does not need to care whether the request is sent directly to the server or forwarded by the proxy server.

  5. Caching: The RESTful architecture supports a caching mechanism, and the client can specify caching rules for response data by setting the response header field. Clients can reuse cached responses, reducing requests to the server, improving performance and efficiency.

The design principles of the RESTful architecture enable the system to have the following advantages:

  • Simplicity: RESTful architecture uses common HTTP protocol and standard URI, which is easy to understand and learn.
  • Scalability: RESTful architecture supports multiple data formats and media types, making it easy to add new resources and functions.
  • Loose coupling: The client and server are decoupled, and each can evolve and expand independently.
  • Portability: Due to the use of the HTTP protocol as the communication protocol, the RESTful API can interact between different platforms and languages.
  • Testability: RESTful APIs can be tested and debugged with simple HTTP requests.

Guess you like

Origin blog.csdn.net/u012581020/article/details/132456527