[Computer Network Notes 6] Application Layer (3) HTTP Cookies, Cache Control, Proxy Services, Short Connections and Long Connections

HTTP Cookies

HTTP's Cookie mechanism uses two fields: the response header field Set-Cookie and the request header field Cookie .

Insert image description here

Cookie can set multiple key-value pairs, multiple Set-Cookie fields can be set in the response header , and multiple key-value pairs can be set after the request header Cookie, separated by semicolons:

Insert image description here

Expires and Max-Age

The validity period of a cookie can be set using the Expires and Max-Age properties.

  • Expires represents the absolute validity time , that is, the deadline , commonly known as " expiration time "
  • Max-Age represents the relative validity time , in seconds . The browser can get the absolute time of expiration by adding Max-Age to the time point when the message is received .

Insert image description here

Expires and Max-Age can appear at the same time, and the expiration time of the two can be consistent or inconsistent, but the browser will give priority to Max-Age to calculate the expiration date.

Domain and Path

We need to set the scope of the cookie so that the browser can only send it to a specific server and URL to avoid being stolen by other websites.

Insert image description here

" Domain " and " Path " specify the domain name and path to which the cookie belongs . Before sending the cookie , the browser will extract the host and path parts from the URI and compare the cookie attributes. If the conditions are not met, the cookie will not be sent in the request header .

HttpOnly

Insert image description here

The attribute " HttpOnly " will tell the browser that this cookie can only be transmitted through the HTTP protocol and is prohibited from accessing in other ways. The browser's JS engine will disable document.cookieall related APIs to prevent script attacks.

SameSite

Another attribute " SameSite " can prevent " Cross-site request forgery " ( XSRF ) attacks. Setting it to " SameSite=Strict " can strictly restrict cookies from being sent across sites with jump links, while " SameSite=Lax " is slightly looser. , allowing safe methods such as GET / HEAD , but prohibiting POST cross-site sending.

Insert image description here

HTTP cache control

Generally speaking, after the browser gets the response content, it will cache the response content in a disk file.

Insert image description here

Whether the browser directly uses the content in the local disk cache depends on the caching policy. The browser will also have a cache cleaning policy to regularly clear cached pages on the disk.

Caching strategy: no-store

The client is not allowed to use caching, which is used for some data that changes very frequently, such as flash sale pages.

Insert image description here

Cache strategy: max-age

The starting point of time calculation is the creation time of the response message, not the time when the client receives the message, which means it includes the time spent by all nodes during the link transmission process.

Insert image description here

The header field used by the server to mark the validity period of resources is " Cache-Control ", and the value " max-age=30 " inside is the validity time of the resource , which is equivalent to telling the browser, "This page can only be cached for 30 seconds, and then it will be considered expired." , cannot be used." ( max-age includes the time spent on the road, the real effective time must be subtracted from the time spent on the road)

Caching strategy: server + client coordinate together

Insert image description here

The client sends Cache-Control: max-age=0 to tell the server that the client will not use the cache, but the browser will use the cache to move forward and backward. When the server sees max-age=0 , it will use a The latest generated message is sent back to the browser.

Cache policy: no-cache

Insert image description here

The meaning of " Cache-Control:no-cache " sent by the client is basically the same as " Cache-Control:max-age=0 ". It depends on how the background server understands it. Usually the effect of the two is the same.

Insert image description here

When the server sends " Cache-Control: no-cache ", it does not mean that caching is not allowed, but that it can be cached, but before using it, you must go to the server to verify whether it has expired and whether there is the latest version.

How to verify cache has expired

You can let the client send a HEAD request, and the server returns the modification time through Last-Modified :

Insert image description here

Then the client compares it with the cached data. If there is no change, it uses the cache to save network traffic. Otherwise, it sends another GET request to obtain the latest version. But the network cost of such two requests is too high.

conditional request

Insert image description here

The server will compare the value of If-Modified-Since with the Last-Modified segment. If they are equal, it means that they have not been modified and respond with 304 .

Insert image description here

If the value of If-Modified-Since is not equal to Last-Modified , it means it has been modified, respond with a 200 status code, and return data.

Disadvantages of using If-Modified-Since and Last-Modified:

  1. The minimum time unit is seconds. If the resource update speed is less than seconds, the cache cannot be used.

  2. If the file is dynamically generated through the server, then the update time of this method is always the time of generation, although the file may not have changed, so it will not serve as a cache.

Solution: Etag & If-None-Match

Etag & If-None-Match

ETag is the abbreviation of "Entity Tag" and represents a unique identifier of a resource.

Insert image description here

The server sends a response header that contains the ETag field. When the client gets the ETag field value and makes the next request, it sends the value using the If-None-Match field in the request header . The server gets the value and compares it with the ETag . If the ETag is equal, it means that the resource has not changed. The server will return status code 304 , and the client will use the local disk to cache the file.

Insert image description here

If the server determines that it is not equal to Etag , it means that the resource has changed, and the server will return status code 200 and return the new resource file content in the body .

Detailed flow chart of server-side cache control strategy and client-side cache control strategy: https://www.processon.com/view/link/62bec42a6376897b9fcfdb47

HTTP proxy service

Proxy is a link between the requester and the responder in the HTTP protocol. As a "transit station", it can forward both the client's request and the server's response.

Insert image description here

forward proxy

Forward proxy: Close to the client and send requests to the server on behalf of the client.

Insert image description here
Insert image description here

How proxy servers perform client access control:

Insert image description here

reverse proxy

Reverse proxy: Close to the server, responding to client requests on behalf of the server.

Insert image description here

Reverse proxy load balancing:

One of the most basic functions of a reverse proxy is load balancing. Because the source server is blocked when facing the client, all the client sees is the proxy server. It does not know how many source servers there are and what IP addresses they are.

Therefore, the proxy server can have the "power" of request distribution and decide which subsequent server will respond to the request.

Insert image description here

Other features of reverse proxy:

  • ① Health check : Use " heartbeat " and other mechanisms to monitor the back-end server, and promptly "kick out" the cluster if a fault is found to ensure high service availability;
  • ② Security protection : protect the proxied back-end server, restrict IP addresses or traffic, and resist network attacks and overload
  • ③ Encryption offloading : Use SSL/TLS encrypted communication authentication for the external network, but no encryption on the secure intranet, eliminating encryption and decryption costs;
  • ④ Data filtering : intercept upstream and downstream data, and arbitrarily specify policy modification requests or responses
  • ⑤ Content caching : temporary storage and reuse of server responses

Proxy related header fields

The proxy server needs to use the field " Via " to indicate the identity of the proxy. "Via" appends the proxy host name (or domain name).

Insert image description here

The Via field only solves the problem of the client and the source server determining whether there is a proxy, and it cannot know the real information of the other party.

Insert image description here

  • " _ _ _ _
  • " _ _

HTTP caching proxy

In order to reduce the pressure on the server, the server will also have a corresponding cache. The server caching function of HTTP is mainly implemented by a reverse proxy server (i.e. caching proxy).

Insert image description here

Insert image description here
Insert image description here
Insert image description here

CDN

CDN : Content distribution network is a network application service, the full name is Content Delivery Network or Content Distribution Network. Its main function is to solve the problem of slow network access speed over "long distance".

Insert image description here

  • ① Edge nodes serve as caching agents
  • ② Visit nearby

CDN mainly caches static resources . " Static resources ": such as pictures and audio. " Dynamic resources " means that the data content is "dynamically changing", that is, it is calculated and generated by the background service and is different every time it is accessed, such as the inventory of goods, the number of fans on Weibo, etc.

The principle of CDN:

Insert image description here

  • ① When a user clicks on the content URL on the website page, after being parsed by the local DNS system, the DNS system will eventually hand over the domain name resolution rights to the CDN dedicated DNS server pointed to by the CNAME.

  • ② The CDN's DNS server returns the CDN's global load balancing device IP address to the local domain name server.

  • ③ The local domain name server returns the IP address of the CDN global load balancing device to the client

  • ④ The CDN global load balancing device selects a regional load balancing device in the user's region based on the user's IP address and the content URL requested by the user, and tells the user to initiate a request to this device.

  • ⑤ The regional load balancing device will select a suitable cache server to provide services for the user. The selection basis includes:
    judging which server is closest to the user based on the user's IP address;
    judging based on the content name carried in the URL requested by the user. Which server has the content required by the user;
    query the current load status of each server to determine which server still has service capabilities

  • ⑥ After comprehensive analysis of the above conditions, the regional load balancing device will return the IP address of a cache server to the global load balancing device.

  • ⑦ The global load balancing device returns the server's IP address to the user.

  • ⑧ The user initiates a request to the cache server, and the cache server responds to the user's request and transmits the content required by the user to the user terminal. If this cache server does not have the content that the user wants, but the regional balancing device still allocates it to the user, then this server will request the content from its upper-level cache server until it is traced back to the source server of the website. Pull content locally.

To summarize briefly, the CDN load balancing device returns the IP address of a cache server closest to the user to the user. If this cache server does not have the content the user needs, it will request the upper level cache server until it is traced back to the source server. Content is pulled locally .

HTTP short connection and long connection

short connection

Because the entire connection process between the client and the server is very short and does not maintain a long-term connection state with the server, it is called a short connection .

Insert image description here

The shortcomings of short connections are quite serious, because in the TCP protocol, establishing and closing connections are very "expensive" operations.

TCP requires a "three-way handshake" to establish a connection, sending 3 data packets and requiring 1 RTT; closing the connection requires "four handshakes", and 4 data packets require 2 RTT.

Long connection

The client and server maintain a long-term connection, so it is called a long connection .

Insert image description here

Cost amortization : Since TCP connection and closing are very time-consuming, then the time cost is evenly allocated from the original "request-response" to multiple "requests-responses".

The implementation method of long connection: heartbeat . That is, within a certain interval, the TCP connection is used to send ultra-short meaningless messages so that the gateway cannot define itself as an " idle connection ", thus preventing the gateway from closing its own connection.

Since long connections can significantly improve performance, all connections in HTTP/1.1 will enable long connections by default .

Connection related header fields

We can also explicitly require the use of a long connection mechanism in the request header. The field used is Connection and the value is " keep-alive ".

Insert image description here

However, regardless of whether the client explicitly requires a long connection, if the server supports long connections, it will always put a "Connection:keep-alive" field in the response message to tell the client: "I support long connections. Next Just use this TCP to send and receive data all the time."

Disadvantages of long connections

If the TCP connection is not closed for a long time, the server must save its state in memory, which takes up the server's resources. If there are a large number of idle long connections that are not connected, the server's resources will be quickly exhausted , causing the server to be unable to provide services to users who really need it.

Therefore, long connections also need to be closed at the appropriate time, and the connection to the server cannot be maintained forever. Both the client and the server can close the connection in appropriate ways.

The client actively closes the long connection

On the client side, you can add the " Connection:close " field to the request header to tell the server: "Close the connection after this communication."

Insert image description here

When the server sees this field, it knows that the client wants to actively close the connection, so it also adds this field to the response message and closes the TCP connection after sending it.

The server actively closes long connections

The server usually does not actively close the connection, but some strategies can also be used. Take Nginx as an example. It has two methods:

  • ① Use the " keepalive_timeout" command to set the timeout for long connections. If there is no data sent or received on the connection within a period of time, the connection will be actively disconnected to avoid idle connections occupying system resources.

  • ② Use the " keepalive_requests" command to set the maximum number of requests that can be sent on a long connection. For example, if it is set to 1000, then Nginx will actively disconnect after processing 1000 requests on this connection.

head-of-line blocking

If the request at the head of the queue is delayed because it is processed too slowly, then all subsequent requests in the queue will have to wait together. This problem is blocking at the head of the queue .

Insert image description here

How to solve the head-of-line blocking problem?
  • Concurrent connections + domain name sharding

  • "Concurrent connections": That is, a browser process initiates multiple long connections to a domain name at the same time.

    Insert image description here
    Note: The HTTP protocol and browser limit the number of concurrent connections to a domain name to 6 ~ 8.

  • Domain name sharding: The server uses multiple different domain names to point to the same IP, which can reduce server pressure.

    Insert image description here

    Generally, companies use multiple different domain names to point to the same IP. Each domain name can support 6 ~ 8 long connections, so 3 domain names can support 18 ~ 24 long connections.

Some differences between HTTP 1.1 and HTTP 1.0

HTTP 1.0 was first used in web pages in 1996. At that time, it was only used for some relatively simple web pages and network requests. However, HTTP 1.1 only began to be widely used in network requests of major browsers in 1999. At the same time, HTTP 1.1 is also the most widely used HTTP protocol currently. The main differences are mainly reflected in:

  1. For cache processing , in HTTP 1.0, If-Modified-Since and Expires in the header are mainly used as the criteria for cache judgment. HTTP 1.1 introduces more cache control strategies such as Entity tag , If-Unmodified-Since , If- Match , If-None-Match and more optional cache headers to control the cache strategy.
  2. Bandwidth optimization and the use of network connections . In HTTP 1.0, there are some phenomena of wasting bandwidth. For example, the client only needs a part of an object, but the server sends the entire object, and does not support the breakpoint resume function. HTTP 1.1 introduces the range header field in the request header, which allows requesting only a certain part of the resource, that is, the return code is 206 ( Partial Content ), which facilitates developers to freely choose to make full use of bandwidth and connections.
  3. For the management of error notifications , 24 new error status response codes have been added to HTTP 1.1. For example, 409 ( Conflict ) indicates that the requested resource conflicts with the current status of the resource; 410 ( Gone ) indicates that a resource on the server is permanently deleted. of deletion.
  4. The Host header is required . In HTTP 1.0, each server is considered to be bound to a unique IP address. Therefore, the URL in the request message does not pass the hostname. However, with the development of virtual host technology, multiple virtual hosts (Multi-homed Web Servers) can exist on a physical server, and they share an IP address. HTTP 1.1 request messages and response messages should support the Host header field, and if there is no Host header field in the request message, an error ( 400 Bad Request ) will be reported.
  5. Persistent connections are enabled by default . HTTP 1.1 supports persistent connections (PersistentConnection) and request pipelining (Pipelining) processing. Multiple HTTP requests and responses can be transmitted on a TCP connection, reducing the consumption and delay of establishing and closing connections. In HTTP 1.1 Connection:keep-alive is turned on by default inHTTP 1.0, which to a certain extent makes up for the shortcomings of HTTP 1.0 that require a connection to be created for every request.

The biggest difference between HTTP 1.0 and HTTP 1.1 is that HTTP 1.1 can support long connections, reduce the number of TCP establishment and closing times, and can make multiple HTTP requests on one TCP connection.

Guess you like

Origin blog.csdn.net/lyabc123456/article/details/133311855