How to understand the browser cache

Foreword

Cache performance optimization can be said that in a simple and efficient way to optimize the. A good caching strategy can shorten the distance page request resources, reduce latency, and since the cache file can be recycled can also reduce bandwidth and reduce network load.

For a data request, the request can be divided into the originating network, back-end processing, the browser in response to three steps. Browser cache can help us optimize performance in the first and third step. For example, instead of using the cache directly initiate a request, or initiate a request but the data stored in back-end and front-end consistent, then there is no need to pass data back and then come back, thus reducing the response data.

What follows we will explore the browser cache mechanism to apply caching policies through the cache location, and cache policy the actual scene.

Cache location

There are four locations from the cache is and each have priority, in order to find the cache and when did not hit when the request will go to the network.

  • Service Worker
  • Memory Cache
  • Disk Cache
  • Push Cache

1、Service Worker

Service Worker is running a separate thread behind the browser, can generally be used to implement caching feature. Use Service Worker words, the transport protocol must be HTTPS. Because the Service Worker involved in the request interceptor, you must use the HTTPS protocol to ensure security. Service Worker different browser cache and other built-in caching mechanism that allows us to freely control which files are cached, how to match the cache, how to read the cache, and the cache is persistent.

Service Worker implement caching function is generally divided into three steps: the way you first need to register Service Worker, and then listen to the files after install event can be cached in need, then the next time the user can request access by intercepting queries whether there is a cache , cached, then the cache files can be read directly, or he would request data.

When Service Worker does not hit the cache, we need to call fetch function to get data. That is, if we do not hit the buffers in the Service Worker, then looks for priority based on the data cache to find. But whether we are or obtained from the Memory Cache request data from the network, the browser will display our content obtained from the Service Worker.

Memory Cache

Memory Cache memory cache that is mainly contained in the current page has crawled to resources, such as the page has been downloaded styles, scripts, images and so on. Read data memory is certainly faster than disk, memory cache read though efficient, but persistent cache is very short, it will be released with the release process. Once we close the Tab page, in-memory caching will be released.

Well, since the memory cache so efficient, we are not allow data are stored in the memory of it? This is impossible. Computer memory capacity must be much smaller than the hard disk, the operating system requires careful planning of the use of memory, so let us use memory much inevitable.

When we visited the page again to refresh the page, you can find a lot of data from the memory cache.

Cache memory has a cache resources is important preloader related instructions (e.g. <linkrel = "prefetch">) downloaded resources. The total is well known preloader related instructions already is a common means of on page optimization, while it can resolve js / css files, while the network requests the next resource.

Thing to note is that the memory cache when the cache resources do not care about the return of HTTP resources caching headers Cache-Control what values ​​while matching resources is not just a URL do match, may also have a Content-Type, CORS and other features do check.

3、Disk Cache

Disk Cache Cache is stored in the hard disk read speed slow, but what can be stored to disk, than the Memory Cache wins on storage capacity and timeliness.

In all browser cache, Disk Cache basic coverage is the greatest. It will be judged according to HTTP Herder fields in which resources need to be cached, which resources may not request directly, what resources need to re-request has expired. And even in the case of cross-site, same address resources once the hard disk cache down, I will not go again requested data. Most of the caches from Disk Cache, on the HTTP protocol header cache field, we will detail below.

What will file browser thrown into memory? What are thrown into the hard disk? In this regard, online vary, but the views are reliable:

  • For large files, high probability is not stored in memory, and vice versa priority
  • Current system memory usage rate, then the file is stored into the hard disk priority

4、Push Cache

Push Cache (push cache) is the content of the HTTP / 2, when the above three did not hit the cache, it will be used. It exists only in the session (Session), once the session ends is released, and the cache time is very short, in the Chrome browser only about five minutes, while it is not strictly enforced HTTP header caching directives.

Push Cache information can be found in the country is small, but also because the HTTP / 2 is not popular in the country. It is recommended reading JakeArchibald the HTTP / 2 push is tougher than I thought this article, article conclusions:

  • All resources can be pushed, and can be cached, but Edge and Safari browser support is relatively poor
  • Resources can be pushed no-cache and no-store of
  • Once the connection is closed, Push Cache is released
  • Multiple pages can use the same connection HTTP / 2, also can use the same Push Cache. This is achieved mainly dependent on the browser may be, for reasons of performance, some browsers will have the same domain name but different tab labels using the same HTTP connection.
  • Push Cache The cache can only be used once
  • You may refuse to accept browser push existing resources
  • You can push resources to other domain

If the above four cache did not hit, then only initiates a request to acquire the resource.

So in order to consider the performance, most of the interfaces should choose a good caching strategy, browser cache is usually divided into two strategies: strong buffer cache and consultation, and caching strategies are set by HTTP Header to achieve.

Cache Process Analysis

The way the browser to communicate with the server response mode, that is: the browser sends an HTTP request - server responds to the request, then the browser how to determine a resource that should not be cached, how to cache it? Get the result of the request to initiate the first browser request to the server, the cache request identifier and the result is stored in the browser cache, the browser cache for processing is determined based on the first response is returned when the requested resource head . FIG procedure is as follows:

From the above chart we can know:

  • The results each time the browser initiates a request, the request will first look in the browser cache and cache identifier
  • Each time the browser requests to get the results are returned and the result is stored in the browser cache identifier cache

The key conclusion is that more than two points the browser cache mechanism, which ensures that the cache is stored and read each request, as long as we understand and then use the browser cache rules, then all the problems will be solved, we will be around this detailed analysis. In order to facilitate understanding, where we re-initiate HTTP requests to the cache server process is divided into two parts according to need, they are strong cache and cache consultation.

Strong Cache

Strong Cache: not send a request to the server, the resource is read directly from the cache, the Network Options chrome console can be seen in the request returns a status code 200, and Size display from disk cache or from memory cache. Strong cache can be set two HTTP Header realization: Expires and Cache-Control.

1、Expires

Cache expiration time, the expiration time for the specified resource is a specific point in time on the server side. That is, Expires = max-age + the request time, and the need to use binding Last-modified. Expires is a Web server response header field that tells the browser in response to a request http browser may take time before the expiration directly from the browser cache data without request again.

Expires is the product of HTTP / 1, and is limited to the local time, local time if modified, could cause cache invalidation.

Expires:Wed,22Oct201808:41:00GMT表示资源会在 Wed, 22 Oct 2018 08:41:00 GMT 后过期,需要再次请求。

2、Cache-Control

在HTTP/1.1中,Cache-Control是最重要的规则,主要用于控制网页缓存。比如当 Cache-Control:max-age=300时,则代表在这个请求正确返回时间(浏览器也会记录下来)的5分钟内再次加载资源,就会命中强缓存。

Cache-Control 可以在请求头或者响应头中设置,并且可以组合使用多种指令:

public:所有内容都将被缓存(客户端和代理服务器都可缓存)。具体来说响应可被任何中间节点缓存,如 Browser <-- proxy1 <-- proxy2 <-- Server,中间的proxy可以缓存资源,比如下次再请求同一资源proxy1直接把自己缓存的东西给 Browser 而不再向proxy2要。

private:所有内容只有客户端可以缓存,Cache-Control的默认取值。具体来说,表示中间节点不允许缓存,对于Browser <-- proxy1 <-- proxy2 <-- Server,proxy 会老老实实把Server 返回的数据发送给proxy1,自己不缓存任何数据。当下次Browser再次请求时proxy会做好请求转发而不是自作主张给自己缓存的数据。

no-cache:客户端缓存内容,是否使用缓存则需要经过协商缓存来验证决定。表示不使用 Cache-Control的缓存控制方式做前置验证,而是使用 Etag 或者Last-Modified字段来控制缓存。需要注意的是,no-cache这个名字有一点误导。设置了no-cache之后,并不是说浏览器就不再缓存数据,只是浏览器在使用缓存数据时,需要先确认一下数据是否还跟服务器保持一致。

no-store:所有内容都不会被缓存,即不使用强制缓存,也不使用协商缓存

max-age:max-age=xxx (xxx is numeric)表示缓存内容将在xxx秒后失效

s-maxage(单位为s):同max-age作用一样,只在代理服务器中生效(比如CDN缓存)。比如当s-maxage=60时,在这60秒中,即使更新了CDN的内容,浏览器也不会进行请求。max-age用于普通缓存,而s-maxage用于代理缓存。s-maxage的优先级高于max-age。如果存在s-maxage,则会覆盖掉max-age和Expires header。

max-stale:能容忍的最大过期时间。max-stale指令标示了客户端愿意接收一个已经过期了的响应。如果指定了max-stale的值,则最大容忍时间为对应的秒数。如果没有指定,那么说明浏览器愿意接收任何age的响应(age表示响应由源站生成或确认的时间与当前时间的差值)。

min-fresh:能够容忍的最小新鲜度。min-fresh标示了客户端不愿意接受新鲜度不多于当前的age加上min-fresh设定的时间之和的响应。

从图中我们可以看到,我们可以将多个指令配合起来一起使用,达到多个目的。比如说我们希望资源能被缓存下来,并且是客户端和代理服务器都能缓存,还能设置缓存失效时间等等。

3、Expires和Cache-Control两者对比

其实这两者差别不大,区别就在于 Expires 是http1.0的产物,Cache-Control是http1.1的产物,两者同时存在的话,Cache-Control优先级高于Expires;在某些不支持HTTP1.1的环境下,Expires就会发挥用处。所以Expires其实是过时的产物,现阶段它的存在只是一种兼容性的写法。强缓存判断是否缓存的依据来自于是否超出某个时间或者某个时间段,而不关心服务器端文件是否已经更新,这可能会导致加载文件不是服务器端最新的内容,那我们如何获知服务器端内容是否已经发生了更新呢?此时我们需要用到协商缓存策略。

协商缓存

协商缓存就是强制缓存失效后,浏览器携带缓存标识向服务器发起请求,由服务器根据缓存标识决定是否使用缓存的过程,主要有以下两种情况:

协商缓存生效,返回304和Not Modified:

协商缓存失效,返回200和请求结果:

协商缓存可以通过设置两种 HTTP Header 实现:Last-Modified 和 ETag 。

1、Last-Modified和If-Modified-Since

浏览器在第一次访问资源时,服务器返回资源的同时,在response header中添加 Last-Modified的header,值是这个资源在服务器上的最后修改时间,浏览器接收后缓存文件和header;

Last-Modified: Fri, 22 Jul 2016 01:47:00 GMT

浏览器下一次请求这个资源,浏览器检测到有 Last-Modified这个header,于是添加If-Modified-Since这个header,值就是Last-Modified中的值;服务器再次收到这个资源请求,会根据 If-Modified-Since 中的值与服务器中这个资源的最后修改时间对比,如果没有变化,返回304和空的响应体,直接从缓存读取,如果If-Modified-Since的时间小于服务器中这个资源的最后修改时间,说明文件有更新,于是返回新的资源文件和200。

但是 Last-Modified 存在一些弊端:

  • 如果本地打开缓存文件,即使没有对文件进行修改,但还是会造成 Last-Modified 被修改,服务端不能命中缓存导致发送相同的资源
  • 因为 Last-Modified 只能以秒计时,如果在不可感知的时间内修改完成文件,那么服务端会认为资源还是命中了,不会返回正确的资源

既然根据文件修改时间来决定是否缓存尚有不足,能否可以直接根据文件内容是否修改来决定缓存策略?所以在 HTTP / 1.1 出现了 ETag 和 If-None-Match

2、ETag和If-None-Match

Etag是服务器响应请求时,返回当前资源文件的一个唯一标识(由服务器生成),只要资源有变化,Etag就会重新生成。浏览器在下一次加载资源向服务器发送请求时,会将上一次返回的Etag值放到request header里的If-None-Match里,服务器只需要比较客户端传来的If-None-Match跟自己服务器上该资源的ETag是否一致,就能很好地判断资源相对客户端而言是否被修改过了。如果服务器发现ETag匹配不上,那么直接以常规GET 200回包形式将新的资源(当然也包括了新的ETag)发给客户端;如果ETag是一致的,则直接返回304知会客户端直接使用本地缓存即可。

3、两者之间对比:

  • 首先在精确度上,Etag要优于Last-Modified。

Last-Modified的时间单位是秒,如果某个文件在1秒内改变了多次,那么他们的Last-Modified其实并没有体现出来修改,但是Etag每次都会改变确保了精度;如果是负载均衡的服务器,各个服务器生成的Last-Modified也有可能不一致。

  • 第二在性能上,Etag要逊于Last-Modified,毕竟Last-Modified只需要记录时间,而Etag需要服务器通过算法来计算出一个hash值。

  • 第三在优先级上,服务器校验优先考虑Etag

缓存机制

强制缓存优先于协商缓存进行,若强制缓存(Expires和Cache-Control)生效则直接使用缓存,若不生效则进行协商缓存(Last-Modified / If-Modified-Since和Etag / If-None-Match),协商缓存由服务器决定是否使用缓存,若协商缓存失效,那么代表该请求的缓存失效,返回200,重新返回资源和缓存标识,再存入浏览器缓存中;生效则返回304,继续使用缓存。具体流程图如下:

看到这里,不知道你是否存在这样一个疑问:如果什么缓存策略都没设置,那么浏览器会怎么处理?

对于这种情况,浏览器会采用一个启发式的算法,通常会取响应头中的 Date 减去 Last-Modified 值的 10% 作为缓存时间。

实际场景应用缓存策略

1、频繁变动的资源

Cache-Control: no-cache

对于频繁变动的资源,首先需要使用 Cache-Control:no-cache 使浏览器每次都请求服务器,然后配合 ETag 或者 Last-Modified 来验证资源是否有效。这样的做法虽然不能节省请求数量,但是能显著减少响应数据大小。

2、不常变化的资源

Cache-Control: max-age=31536000

通常在处理这类资源时,给它们的 Cache-Control 配置一个很大的 max-age=31536000 (一年),这样浏览器之后请求相同的 URL 会命中强制缓存。而为了解决更新的问题,就需要在文件名(或者路径)中添加 hash, 版本号等动态字符,之后更改动态字符,从而达到更改引用 URL 的目的,让之前的强制缓存失效 (其实并未立即失效,只是不再使用了而已)。在线提供的类库 (如 jquery-3.3.1.min.js, lodash.min.js 等) 均采用这个模式。

用户行为对浏览器缓存的影响

所谓用户行为对浏览器缓存的影响,指的就是用户在浏览器如何操作时,会触发怎样的缓存策略。主要有 3 种:

  • 打开网页,地址栏输入地址: 查找 disk cache 中是否有匹配。如有则使用;如没有则发送网络请求。
  • 普通刷新 (F5):因为 TAB 并没有关闭,因此 memory cache 是可用的,会被优先使用(如果匹配的话)。其次才是 disk cache。
  • 强制刷新 (Ctrl + F5):浏览器不使用缓存,因此发送的请求头部均带有 Cache-control:no-cache(为了兼容,还带了 Pragma:no-cache),服务器直接返回 200 和最新内容。

Guess you like

Origin www.cnblogs.com/ShuiNian/p/12079178.html