Web caching (mandatory caching, negotiation caching, CDN caching)


img
For the contents of localStorage, sessionStorage, and cookies in the local cache, please refer to the link

1. HTTP cache

Http cache refers to: When the client requests resources from the server, it will first arrive at the browser cache. If the browser has a copy of the "resource to be requested", it can be extracted directly from the browser cache instead of from the original server. this resource.

HTTP caching starts from the second request. When the resource is requested for the first time, the server returns the resource and returns the cache parameters of the resource in the response header; when the browser judges the request parameters for the second time, it will directly 200 if it hits the strong cache, otherwise it will add the request parameters Pass it to the server in the request header to see if it hits the negotiation cache, and returns 304 if it hits , otherwise the server will return a new resource .

Common http caches can only cache resources that respond to get requests

 

1.1 Strong cache

image-20220221191218216
When the browser sends a request to the server for the first time, if the server thinks that the resource needs to be cached, the server will add a cache-control in the response-header , such as setting max-age, so that the browser will Save the corresponding file in the local cache.
image-20220221191525194
When the browser requests the same file next time, the browser will check whether the max-age has expired. If it has not expired, it will directly obtain the resource from the local cache and will not send a request to the server, which will improve the page. loading speed. If max-age expires, the browser will send a request to the browser as it did the first time.
image-20220221191953834
In this way, the loading speed of the page is the fastest, and the performance is also very good, but during this period, if the resource on the server side is modified, it will not be available on the page, because it will no longer send a request to the server. This kind of situation is what we often encounter in development. For example, you modify a certain style on the page, refresh it on the page but it does not take effect, because it uses a strong cache, so it will be fine after Ctrl + F5.

From memory cache means to use the cache in the memory , and from disk cache means to use the cache in the hard disk . The order in which the browser reads the cache is memory –> disk . In the browser, the browser will directly store files such as js and pictures in the memory cache after parsing and executing them, so when the page is refreshed, it only needs to read directly from the memory cache (from memory cache); while the css file will be stored Into the hard disk file, so every time the page is rendered, the cache needs to be read from the hard disk (from disk cache).

1. Mandatory cache header attribute (Pragma/Cache-Control/Expires)

When the browser requests a resource, it will first obtain the header information of the resource cache, and judge whether it hits the strong cache ( cache-control and expires information ). If it hits, it directly obtains the resource information from the cache, including the cache header information. This time The request does not communicate with the server at all.

  • expires : This is the specification of http1.0 ; its value is an absolute time string in GMT format, such as Mon, 10 Jun 2015 21:31:12 GMT, if the time of sending the request is before expires, then the local The cache is always valid, otherwise a request is sent to the server to get the resource.
  • cache-control : This is the header information that appeared in http1.1. It is mainly judged by the max-age value of this field. It is a relative value; the resource's first request time and the validity period set by Cache- Control , calculate a resource expiration time , and then compare this expiration time with the current request time. If the request time is before the expiration time, the cache can be hit, otherwise it will not work; whether the cache-control is set or not depends on the server. Server-side settings, the front-end does not need to do anything

Common options for cache-control

options explain
max-age=100 The cache expires after 100 seconds, and the resources are cached locally
no-cache Do not use local cache. Using the negotiation cache , first confirm with the server whether the returned response has been changed. If there is an ETag in the previous response, the request will be verified with the server. If the resource has not been changed, re-downloading can be avoided.
no-store All content will not be cached, neither mandatory caching nor negotiation caching is used. Every time the user requests the resource, a request will be sent to the server, and the server will return the resource
public Can be cached by all users, including clients and proxies
private It can only be cached by the client, and it is not allowed to be cached by relay cache servers such as CDN
s-maxage Override max-age, the effect is the same as max-age, but only for caching in the proxy server

Note: If cache-control and expires exist at the same time, the priority of cache-control is higher than expires

image-20220131181800452

1.2 Negotiation cache (comparison cache)

Negotiating caching is the process in which the browser sends a request to the server with the cache identifier after the cache is forced to expire, and the server decides whether to use the cache according to the cache identifier.

image-20220221192523627
When the browser requests for the first time, if the server uses the negotiated caching strategy, it will return the resource and the resource identifier, and the browser will store the returned resource in the local cache.
image-20220221192856650

When the browser requests the resource again, the browser sends the request and resource identifier to the server, and the server will then judge whether the currently requested resource browser’s cached version is consistent with the latest version of the resource in the server:

  • If the versions are the same, the server returns a 304 status code, redirecting the browser to get the resource directly from the local cache;
  • If the versions are inconsistent, the server returns a 200 status code, the latest resource and a new resource identifier, and the browser updates the local cache.

Resource ID:

  • Last-Modified/If-Modified-Since: Refers to the time when the resource was last modified

  • Etag/If-None-Match: the unique string corresponding to the resource

    The server will generate a unique identification string for each resource. As long as the file content is different, their corresponding Etags will be different; We edited the file, but the content of the file did not change. Because the server judges according to the last modification time of the file, which leads to re-request, Etag appears. Etag also has performance loss for the server. Last-Modified and ETag can be used together. The server will give priority to verifying ETag. It will continue to compare Last-Modified, and finally decide whether to return 304.

(一)Last-Modified

image-20220221194338134
When the browser makes the first request, the server returns the resource and the resource identifier Last-Modified (in the response header).
image-20220221194412153
In subsequent requests, the browser will initiate a request with the resource identifier If-Modified-Since on the request header. If The value of -Modified-Since is the value of Last-Modified returned in the last request. At this time, the server will compare the values ​​of If-Modified-Since and Last-Modified to determine whether it is the latest resource.

 

(2) ETag

image-20220221195547849
image-20220221195508743
Its process is similar to Last-Modified.
 

(3) Comparison of Last-Modified and ETag

image-20220221211336365

ETag is preferred for the following reasons:

  1. The value of Last-Modified can only be accurate to the second level
  2. If the file is generated repeatedly at regular intervals, but the content is the same, or the content has been modified many times but is finally consistent with the content of the last file, but this also changes the value of Last-Modified. If Last-Modified is used, the resource file is returned every time, even if the content is the same. But ETag can judge that the content of the file is the same, it will use the cache instead of sending a request to the server

 

1.3 The whole process of HTTP cache

QQ picture 20220223120244

 

2. CDN (Content Distribution Network)

When our computer visits a website, the server of this website may be thousands of miles away from us. The farther the distance means, the more nodes will pass through in the middle, and there may be congestion and packet loss between nodes. As a result, we cannot open the webpage for a long time, and we will choose to close the webpage. The server does not know where users will come to visit, and users from any corner of the world must be prepared to visit. According to the thinking of selling things, it is to open more branches, and the server is also the same strategy, backing up multiple servers to all parts of the world, but this requires venues, networks and personnel to maintain, so the servers around the world form a network. Create a content distribution network.

Keywords: speed up speed up speed up! ! ! Distribute the content users want faster

CDN: Content Delivery Network (Content Delivery Network) is a distributed network that is established and overlaid on the bearer network and consists of edge node server groups distributed in different regions. Its purpose is to publish the content of the website to the "edge" (edge ​​server) of the network closest to the user by adding a new layer of network architecture to the existing Internet, so that users can obtain the required content nearby and improve user access. The responsiveness of the website.

2.1 What is distributed

The distributed content can be divided into static content and dynamic content

  • Static content: long-term fixed content
  • Dynamic Content: Content that changes frequently

Static content is not always stored in the CDN. When the source server sends files to the CDN, the cache-control in the HTTP header can be used. Using the cache mechanism, the CDN can know which resources can be stored and which cannot wait.

2.2 CDN distribution process

static content

The CDN does not have the source content of the website, so the source server will back up the static content to the CDN in advance, also called push, so that when users from all over the world need to visit the webpage, the nearest CDN server will provide the static content to the user, no need Every time I go to the source server.

image-20220222112432417

If the source server does not back up the static content to the CDN in advance, then when the user visits the webpage, the CDN has to request the corresponding static content from the source server, that is, pull, and the source server can also let the CDN back up the content, and then provide the content For the user, because there is a backup, other users who make the request at the same time can also get the content immediately.

image-20220222112613489

dynamic content

It is very difficult to distribute dynamic content with CDN, because the dynamic content will actually change according to each user or according to each time period, it is difficult for the source server to predict the dynamic content of each user in advance, and then Push to CDN in advance. If you wait for the user to request the dynamic content, the CDN will request it from the source server. This is not much different from directly requesting the server. The CDN cannot provide much acceleration service, so there is no need for it.

But there are still CDNs that can provide services. For example, to obtain dynamic time now, some CDNs will provide interfaces that can run on the CDN, so that the source server uses these CDN interfaces instead of the source server's own code, so that users can directly Get time from CDN.

image-20220222112751263

The layout of the CDN is equivalent to adding a wall between the source server and the user invisibly. The user no longer directly accesses the server, but communicates through the CDN, so that there is no need to worry about malicious DDos attacks.

image-20220205095329513

2.3 Working principle of CDN

The CDN network is to add a Cache layer between the user and the server , mainly by taking over the DNS , directing the user's request to the Cache to obtain the data of the source server, thereby reducing the network access time.

1. Access process of traditional uncached service

image-20220205100338089
As can be seen from the figure, the traditional network access process is as follows:

  1. The user enters the domain name to be accessed www.a.com, and the operating system queries LocalDns for the IP address of the domain name;
  2. LocalDns checks whether www.a.comthere is an . If there is, it will be directly returned to the end user; if not, it will query the authoritative server of the domain name from ROOT DNS.
  3. ROOT DNS returns the domain name authorization dns record CNAME to LocalDns;
  4. After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the domain name authorized dns (authorization server);
  5. After the domain name authorization dns (authorization server) queries the domain name record, it responds to LocalDns;
  6. LocalDns responds to the client with the obtained domain name ip address;
  7. After the user obtains the IP address of the domain name, he visits the site server;
  8. The site server answers the request, returning the content to the client.

2. The access process of the website after using CDN cache

image-20220205100922465

As shown above, it is the network access process after using CDN cache:

  1. The user enters the domain name to be accessed www.a.com, and the operating system queries LocalDns for the IP address of the domain name;
  2. LocalDns checks whether www.a.comthere is an . If there is, it will be directly returned to the end user; if not, it will query the authoritative server of the domain name from ROOT DNS.
  3. ROOT DNS responds the domain name authorization dns record to LocalDns;
  4. After LocalDns obtains the authorized dns record of the domain name, it continues to query the IP address of the domain name from the authorized dns of the domain name;
  5. After the domain name authorization dns queries the domain name record (usually CNAME), it responds to LocalDns;
  6. After LocalDns obtains the domain name record, it queries the IP address of the domain name from the intelligent scheduling DNS system;
  7. Intelligent scheduling DNS responds to LocalDns with the most suitable CDN node ip address according to certain algorithms and strategies (such as static topology, capacity, etc.);
  8. LocalDns responds to the client with the obtained domain name ip address;
  9. After the user obtains the IP address of the domain name, he visits the site server.

From this example, we can understand:
(1) CDN acceleration resources are bound to domain names.
(2) To access resources through a domain name, first look up the IP of the CDN node (edge ​​server) closest to the user through DNS
(3) When accessing actual resources through IP, if there is no cached resource on the CDN, it will go to the source site to request the resource , and cache it on the CDN node, so that when the user visits next time, the CDN node will have the cache of the corresponding resource.

 

2.4 Security and Reliability of CDN

The emergence of CDN has led to the possibility of attackers attacking CDN, what should I do if the CDN is down.

Multiple CDN servers are deployed in various places, and then the load status of the CDN server is monitored. If a server is overloaded or down, the user's request will be transferred to the CDN server that is not overloaded, so as to distribute the network evenly Traffic, that is, load balancing.

image-20220222112816996

The CDN method of transferring these traffic is similar to that of the DNS root server. Anycast technology is used. After using anycast technology, the server has the same IP address externally. If this IP address receives a user After the request, the request will be responded by the server closest to the user

CDN will also use TSL/SSL certificate to protect the website

 

2.5 Applications

CDN is widely used and supports content acceleration in various industries and scenarios, such as small picture files, large file downloads, video and audio on demand, live streaming media, site-wide acceleration, and security acceleration.

  • Use third-party CDN services: CDN acceleration services for front-end open source projects, third-party CDN services
  • CDN is often used to store static resources : the so-called "static resources" are resources such as JS, CSS, pictures, etc. that do not require business servers to calculate. "Dynamic resources", as the name suggests, are resources that need to be dynamically generated by the backend in real time. The more common ones are JSP, ASP, or HTML pages that rely on server-side rendering. And we can write scripts combined with Webpack to put the corresponding static resources directly on the CDN, and complete one-click automatic deployment of the entire project.
  • Put static resources and business servers under different domain names: It is best to enable a new domain name for the static server to avoid carrying cookies every time it is requested.
  • Live broadcast transmission : Live broadcast is essentially transmitted by streaming media, so CDN also supports streaming media transmission, so live broadcast can use CDN to improve access speed. When CDN processes streaming media, it is different from processing ordinary static files. If ordinary files are not found on the edge node, it will go to the upper layer to search for them. However, the data volume of streaming media itself is very large. The method will inevitably bring about performance problems, so streaming media generally adopts the method of active push.

Guess you like

Origin blog.csdn.net/weixin_45950819/article/details/123087074