Detailed explanation of browser HTTP protocol caching mechanism

Reprinted from: https://my.oschina.net/leejun2005/blog/369148

 

1. Classification of cache

The cache is divided into server side (server side, such as Nginx, Apache) and client side (client side, such as web browser).

Server-side cache is further divided into proxy server cache and reverse proxy server cache (also called gateway cache, such as Nginx reverse proxy, Squid, etc.). In fact, the widely used CDN is also a server-side cache. The purpose is to make user requests Take "shortcuts" and cache static resources such as pictures and files.

Client-side caching generally refers to browser caching. The purpose is to speed up access to various static resources. Think about today's large websites, any page is one or two hundred requests, and the PV is at the level of 100 million every day. If there is no caching , the user experience will drop sharply, and the server pressure and network bandwidth will be severely tested.

2. Detailed explanation of browser caching mechanism

There are two browser cache control mechanisms: HTML Meta tags vs. HTTP headers

2.1  HTML Meta tag controls cache

The browser caching mechanism is actually the caching mechanism defined by the HTTP protocol (such as Expires; Cache-control, etc.). However, there are also caching mechanisms that are not defined by the HTTP protocol. For example, using the HTML Meta tag, web developers can add a <meta> tag to the <head> node of an HTML page. The code is as follows:

<META HTTP-EQUIV="Pragma" CONTENT="no-cache">

The function of the above code is to tell the browser that the current page is not cached, and each visit needs to go to the server to pull it. It is simple to use, but only supported by some browsers, and not supported by all caching proxy servers, because the proxy does not parse the HTML content itself. The widely used HTTP header information is used to control the cache. Below I mainly introduce the cache mechanism defined by the HTTP protocol.

2.2  HTTP header information controls the cache

2.2.1 Browser request process

  • Browser first request flow chart:

  • When the browser requests again:

2.2.2 Explanation of several important concepts

  • Expires strategy : Expires is a web server response message header field. When responding to an http request, it tells the browser that the browser can directly fetch data from the browser cache before the expiration time without having to request again. However, Expires is an HTTP 1.0 thing, and now the default browsers use HTTP 1.1 by default, so its role is basically ignored. One disadvantage of Expires is that the returned expiration time is the server-side time, so there is a problem. If the client's time is very different from the server's time (for example, the clock is out of sync, or across time zones), then the error is very large, so Starting with HTTP version 1.1, use Cache-Control: max-age=seconds instead.

  • Cache-control strategy (focus) : Cache-Control has the same function as Expires. Both indicate the validity period of the current resource and control whether the browser directly fetches data from the browser cache or re-sends the request to the server to fetch data. It's just that Cache-Control has more choices and more detailed settings. If set at the same time, its priority is higher than Expires.

值可以是public、private、no-cachenostoreno-transform、must-revalidate、proxy-revalidate、max-age
各个消息中的指令含义如下:
Public指示响应可被任何缓存区缓存。
Private指示对于单个用户的整个或部分响应消息,不能被共享缓存处理。这允许服务器仅仅描述当用户的部分响应消息,此响应消息对于其他用户的请求无效。
no-cache指示请求或响应消息不能缓存,该选项并不是说可以设置”不缓存“,容易望文生义~
no-store用于防止重要的信息被无意的发布。在请求消息中发送将使得请求和响应消息都不使用缓存,完全不存下來。
max-age指示客户机可以接收生存期不大于指定时间(以秒为单位)的响应。
min-fresh指示客户机可以接收响应时间小于当前时间加上指定时间的响应。
max-stale指示客户机可以接收超出超时期间的响应消息。如果指定max-stale消息的值,那么客户机可以接收超出超时期指定值之内的响应消息。
  • Last-Modified/If-Modified-Since:Last-Modified/If-Modified-Since要配合Cache-Control使用。

Last-Modified:标示这个响应资源的最后修改时间。web服务器在响应请求时,告诉浏览器资源的最后修改时间。
If-Modified-Since:当资源过期时(使用Cache-Control标识的max-age),发现资源具有Last-Modified声明,则再次向web服务器请求时带上头 If-Modified-Since,表示请求时间。web服务器收到请求后发现有头If-Modified-Since 则与被请求资源的最后修改时间进行比对。若最后修改时间较新,说明资源又被改动过,则响应整片资源内容(写在响应消息包体内),HTTP 200;若最后修改时间较旧,说明资源无新修改,则响应HTTP 304 (无需包体,节省浏览),告知浏览器继续使用所保存的cache
  • Etag/If-None-Match : Etag/If-None-Match should also be used with Cache-Control.

Etag:web服务器响应请求时,告诉浏览器当前资源在服务器的唯一标识(生成规则由服务器决定)。Apache中,ETag的值,默认是对文件的索引节(INode),大小(Size)和最后修改时间(MTime)进行Hash后得到的。
If-None-Match:当资源过期时(使用Cache-Control标识的max-age),发现资源具有Etage声明,则再次向web服务器请求时带上头If-None-Match (Etag的值)。web服务器收到请求后发现有头If-None-Match 则与被请求资源的相应校验串进行比对,决定返回200304
  • How can Etag be born after Last-Modified? You might think that using Last-Modified is enough to let the browser know if the local cached copy is fresh enough, why do you need an Etag (entity identifier)? The emergence of Etag in HTTP 1.1 is mainly to solve several problems that are difficult to solve with Last-Modified:

Last-Modified标注的最后修改只能精确到秒级,如果某些文件在1秒钟以内,被修改多次的话,它将不能准确标注文件的修改时间
如果某些文件会被定期生成,当有时内容并没有任何变化,但Last-Modified却改变了,导致文件没法使用缓存
有可能存在服务器没有准确获取文件修改时间,或者与代理服务器时间不一致等情形

Etag is the unique identifier on the server side of the corresponding resource automatically generated by the server or generated by the developer, which can control the cache more accurately. When Last-Modified is used together with ETag, the server will first validate the ETag.

  • yahoo's Yslow rule suggests to set Etag carefully : It should be noted that the last-modified of files between multiple machines in a distributed system must be consistent, so as to avoid load balancing to different machines and cause comparison failure, Yahoo recommends that the distributed system be closed as much as possible Drop Etags (etags generated by each machine will be different, because in addition to last-modified, it is difficult to keep inodes consistent).

  • The Pragma line is for compatibility with HTTP 1.0 and has the same effect as Cache-Control: no-cache.

  • Finally, summarize the differences between several status codes :

3. User behavior and caching

Browser cache behavior is also related to user behavior. If you have any impression of forced refresh (Ctrl + F5), you should immediately understand what I mean~

User action

Expires/Cache-Control

Last-Modified/Etag

Enter in the address bar

efficient

efficient

page link jump

efficient

efficient

new window

efficient

efficient

forward, backward

efficient

efficient

F5/ button refresh

Invalid (BR reset max-age=0)

efficient

Ctrl+F5 refresh

invalid (reset CC=no-cache)

invalid (request header discards this option)

For details, please refer to Refer [6] at the end of the article

4、Refer:

[1] Browser caching mechanism

http://www.cnblogs.com/skynet/archive/2012/11/28/2792503.html

[2] Web caching knowledge for web developers

http://www.oschina.net/news/41397/web-cache-knowledge

[3] Browser cache details: expires, cache-control, last-modified, etag details

http://blog.csdn.net/eroswang/article/details/8302191

[4] The difference between pressing Enter, F5, Ctrl+F5 in the browser address bar to refresh the webpage

http://cloudbbs.org/forum.php?mod=viewthread&tid=15790

http://blog.csdn.net/yui/article/details/6584401

[5] Cache Control and ETag

https://blog.othree.net/log/2012/12/22/cache-control-and-etag/

[6] Cached Stories

http://segmentfault.com/blog/animabear/1190000000375344

[7] Google's PageSpeed ​​website optimization theory mentions that using Etag can reduce server load

https://developers.google.com/speed/docs/pss/AddEtags

[8] yahoo's Yslow rule suggests to set Etag carefully

http://developer.yahoo.com/performance/rules.html#etags

[9] H5 Cache Mechanism Analysis on Mobile Web Loading Performance Optimization

http://segmentfault.com/a/1190000004132566

[10] Web Page Performance: Cache Efficiency Practice

http://www.w3ctech.com/topic/1648

[11] View HTTP cache through browser

http://www.cnblogs.com/skylar/p/browser-http-caching.html

[12] Summary and application of browser cache knowledge

http://web.jobbole.com/84888/

[13] How to develop and deploy front-end code in a large company?

http://zhihu.com/question/20790576/answer/32602154?utm_campaign=webshare&utm_source=weibo&utm_medium=zhihu

[14] Detailed explanation of browser caching mechanism

https://mangguo.org/browser-cache-mechanism-detailed/

[15] About cache and Chrome's "New Refresh"

 

http://www.cnblogs.com/ziyunfei/p/6308652.html

 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326858490&siteId=291194637