Strong caching and negotiation caching of http caching strategy

Foreword:

In some scenarios on the web, a lot of content does not need to be changed. If every request requests content data that will not change for a period of time, it will cause unnecessary waste of bandwidth.

Sometimes when the network is poor, it takes a long time to request the content to open the page.

Therefore, through the caching mechanism of the browser, the collaborative server allows the browser to cache resources that do not need to be frequently changed to effectively reduce traffic consumption and response time.

1. What is http cache

HTTP cache refers to: when the client requests resources from the server, it will first check the browser cache, if the browser has a copy of "request resources", it can be directly extracted from the browser cache, without the need to get from the server Request this resource.

It should be noted that the common HTTP cache can only cache GET request resources, so the following request cache refers to GET requests.

HTTP cache classification: Divide the HTTP cache into two categories based on whether a request is made to the server, strong cache and negotiated cache .

The http cache is started from the second request:

  • When requesting resources for the first time, the client requests resources from the server, the server returns the response resources, and returns the cache parameters of the resources in the response header;
  • When requesting resources for the second time, the browser judges these request parameters and hits the strong cache and returns 200. Use the resources in the disk cache and do not request the server. Otherwise, the request parameters are added to the request header and passed to the server to see if it hits the negotiation. Cache, hit return 304, use cache resources, if there is no hit, the server will return new resources.

Two, strong cache

Strong caching is to set the expiration time expires  or effective time of the max-age  cache . Within the effective time, the cache will not be invalidated, and the browser reads resources directly from the browser cache. When the requested resource is not in the cache database, or the requested resource is invalid, the resource will only be requested from the server.

Request response headers related to mandatory caching: 

  • Expires

The response header represents the expiration time of the resource. However, due to possible discrepancies between the server time and the client time, this will also lead to possible cache hit errors. On the other hand, Expires is a product of HTTP1.0, so most of them now use Cache-Control instead.

  • Cache-Control (priority higher than Expires)

Request / response header, cache control field, precise control of cache strategy. Cache-Control has many attributes, and different attributes have different meanings.

  1. private: client can cache
  2. public: both client and proxy server can cache
  3. max-age = x: cache content will expire after x seconds
  4. no-cache: Negotiated cache is required to verify cached data
  5. no-store: all content will not be cached
  • pragma 

Its values ​​are no-cache and no-store, which means the same as cacha-control, and has a higher priority than cache-control and expires. That is, when the three appear at the same time, first look at pragma-> cache-control-> expires.

3. Negotiation cache

Negotiation of the cache requires comparing whether the resource is modified on the server side to determine whether the cache can be used. If there is no change, it will return 304 status code, the browser can get the status code and use the cached data directly. Otherwise, the server returns new resources.

It can be seen that in the negotiation cache, how to determine whether the resource is changed is particularly important. There are currently two main strategies: Last-Modified  and  Etag 

  • Last-Modified

When the server responds to the request, it will tell the browser the last modification time of the resource in GMT format.

When the browser does not request the server for the first time, the request header will contain the if-Modified-Since field, followed by the last modification time obtained in the cache. When the server receives this request, the existing if-Modified-Since is compared with the last modification time of the requested resource. If they match, it returns 304 and the response header. At this time, the browser only needs to obtain the resource from the cache. can. 

  • If it is really modified: the transmission response starts as a whole, the server returns: 200 OK
  • If it has not been modified: then only the response header needs to be transmitted, and the server returns: 304 Not Modified

Last-Modified is not necessarily completely accurate in practice:

  1. The last modification of Last-Modified can only be accurate to the second level. If a file is modified multiple times within 1 second, it will not be able to accurately mark the modification time of the file
  2. If some files have been modified, but the content has not changed, but Last-Modified has changed, so that the file cannot use the cache
  3. There may be situations where the server does not accurately obtain the file modification time, or is inconsistent with the proxy server time

Therefore, HTTP 1.1 introduced Etag to improve these problems.

  • Etag (priority higher than Last-Modified)

When the server responds to the request, it tells the browser the unique identifier of the current resource generated by the server through this field (the generation rules are determined by the server).

When the server is not requested for the first time, the request headers of the browser will contain the  If-None-Match  field, and the following value is the identifier obtained in the cache. After receiving this message, the server  compares the value of If-None-Match  with the unique identifier of the requested resource.

  • If it is different, it means that the resource has been changed, the status code 200 is returned, and the server returns the new resource.
  • The same, indicating that the resource has not been modified, the status code 304 is returned, and the browser directly obtains the data resource from the cache.

However, etag also has a disadvantage, that is, each time the expression string is generated, it will increase the server's overhead. So how to use last-modified and etag also needs to be weighed according to specific needs.

Fourth, hands-on

Ok, the theory part is finished, let ’s try it out for ourselves

I used nodejs to open a server to get the response of the get request and set the relevant attributes of the http cache. The web page part is a simple loading image. The beginner nodejs write interface is not very perfect, just to understand how to cache Use for reference only.

Strong cache:

The code of the get request part in nodejs:

app.get("/api/getPic", (req, res) => {
  res.setHeader("Cache-Control", "public,max-age=120");  //max-age设置的2分钟
  let date = new Date(Date.now() + 5000).toUTCString(); //Expires过期时间设置了5分钟后
  res.setHeader("Expires", date);
  let data = JSON.stringify({
    msg: "请求成功",
    result: [
      {
        url:
          "https://ss3.bdstatic.com/70cFv8Sh_Q1YnxGkpoWK1HF6hhy/it/u=350525183,1430160676&fm=11&gp=0.jpg"
      }
    ]
  });
  res.send(data);
});

Send a request, you can see that the configuration of the strong cache has been set up (the default is the etag with the negotiation cache, but I don't know why ...):

Immediately after I refreshed the page, it became sauce purple at this time, the status code is 200, and from disk cache is displayed at the back , and the picture resources at this time are obtained from the disk cache without requesting the server:

 Two minutes later, the page is refreshed again. At this time, the status code shows 304, that is, the requested server, the requested resource has not been changed, and the negotiated cache hit:

Since the Etag negotiation cache is brought by default, there is no handwriting in the negotiation cache. Interested partners can write their own settings for setting Last-Modified and Etag attribute values.

supplement:

There are three situations in the size of the network request

  1. from memory cache
  2. from disk cache
  3. Resource size

  1. The status code showing the value is 200, download the latest resources directly from the server
  2. from memory cache does not request network resources, resources are in memory, general js scripts, fonts, pictures will be stored in memory
  3. from disk cache does not request network resources, in the disk, generally non-script will be stored in memory, such as css

5. Different web page refresh operations

We divide access and refresh into the following three situations:

  • Tag entry, enter url and enter: operate according to the specified caching strategy
  • Press the refresh button, F5 refresh, right click on the webpage "Reload": strong cache invalid, directly judge the negotiated cache
  • ctrl + F5 forced refresh: all caches are invalid, request server data again
Published 71 original articles · Likes5 · Visitors 20,000+

Guess you like

Origin blog.csdn.net/DZY_12/article/details/105378730
Recommended