Perverted static resource caching and updating (super-detailed good article)

Original text: http://blog.csdn.net/zhangjs712/article/details/51166748

notes:

1. Specify version numbers on static resources

2. Use file digest to save as version number

3. When using CDN to deploy static resources, if html and static resources are modified at the same time, the deployment is not easy. If you deploy resources first, and then deploy pages, for users without cache, the old pages will load new resources. If an error occurs, if you deploy the page first, Re-deploy the resources, the new page only uses the old resources, and an error occurs. The solution is to put the resource version number in the URL instead of ? Later, the new and old versions of resources can exist on the CDN at the same time, and then the resources are deployed first, and then the pages are deployed.


Perverted static resource caching and updating (super-detailed good article)

reprint  April 16, 2016 10:49:32

This is a very interesting non-mainstream front-end field. This field needs to explore how to use engineering methods to solve the comprehensive problems of front-end development and deployment optimization. I have been learning and practicing until now.

In my impression, facebook is the originator of this field. Students who are interested and have a ladder can go to the source code of facebook pages and experience what engineering is.

Next, I want to start from the principle, many pictures, long, I hope to have the patience to read it.



Let's go back to basics and start with the original front-end development. The picture above is a "cute" index.html page and its style file a.css, write the code with a text editor, no need to compile, preview it locally, confirm OK, throw it to the server, and wait for the user to visit. The front end is so simple, it's fun, the threshold is so low, you can learn it in minutes!




Then we visit the page, see the effect, and check the network request again, 200! Yes, too™ perfect! Then, the development is complete. . . . Is it?

Wait, it's not over yet! For large companies, those perverted traffic and performance indicators will make the front end not "fun" at all.

Take a look at the a.css request. If the page has to be loaded every time the user accesses the page, will it affect performance and waste bandwidth? We hope this is the best way:




Use 304 to tell the browser to use the local cache. But is that enough? Nope! 304 is called negotiation cache. This thing still needs to communicate with the server once. Our optimization level is abnormal level, so we must completely kill this request and become like this:




Force the browser to use a local cache (cache-control/expires) and not communicate with the server. Well, the optimization of requests has reached a perverted level, so here comes the question: you don’t let the browser send resource requests, how do you update the cache?

Very good, I believe someone has come up with a way: by updating the resource path referenced in the page, let the browser actively give up the cache and load the new resource. Like this:




The next time you go online, change the link address to a new version and update the resources, right? OK, is the problem solved? ! of course not! The metamorphosis of big companies is here again, think about this situation:




The page references 3 css, and only a.css is changed in a certain online. If all the links are updated, the cache of b.css and c.css will also be invalid. Wouldn't it be a waste? !

Re-opening the abnormal mode, it is not difficult to find that to solve this problem, the modification of the url must be associated with the content of the file, that is to say, only the change of the content of the file will lead to the change of the corresponding url, so as to achieve accurate caching at the file level control.

What is related to the file content? We will naturally think of using the data abstraction algorithm to obtain the abstract information of the file. The abstract information corresponds to the content of the file one by one, and there is a cache control basis that can be accurate to the granularity of a single file. Ok, let's change the url to one with summary information:




This time, if there is another file modification, only the url corresponding to that file will be updated, which seems perfect here. Do you think this is enough? The big company tells you: the pattern is broken in Tucson!

Oh~~~~, let me take a breath

Modern Internet companies, in order to further improve website performance, deploy static resources and dynamic web pages in clusters, static resources will be deployed to CDN nodes, and resources referenced in web pages will also become corresponding deployment paths:




Well, when I want to update the static resources, I also update the references in the html, like this:




In this release, the page structure and style have been changed at the same time, and the url address corresponding to the static resources has also been updated. Now the code will be released online. Dear front-end R&D students, please tell me whether we should launch the page first or the static resources first. ?

先部署页面,再部署资源:在二者部署的时间间隔内,如果有用户访问页面,就会在新的页面结构中加载旧的资源,并且把这个旧版本的资源当做新版本缓存起来,其结果就是:用户访问到了一个样式错乱的页面,除非手动刷新,否则在资源缓存过期之前,页面会一直执行错误。

先部署资源,再部署页面:在部署时间间隔之内,有旧版本资源本地缓存的用户访问网站,由于请求的页面是旧版本的,资源引用没有改变,浏览器将直接使用本地缓存,这种情况下页面展现正常;但没有本地缓存或者缓存过期的用户访问网站,就会出现旧版本页面加载新版本资源的情况,导致页面执行错误,但当页面完成部署,这部分用户再次访问页面又会恢复正常了。

好的,上面一坨分析想说的就是:先部署谁都不成!都会导致部署过程中发生页面错乱的问题。所以,访问量不大的项目,可以让研发同学苦逼一把,等到半夜偷偷上线,先上静态资源,再部署页面,看起来问题少一些。

但是,大公司超变态,没有这样的“绝对低峰期”,只有“相对低峰期”。So,为了稳定的服务,还得继续追求极致啊!

这个奇葩问题,起源于资源的 覆盖式发布,用 待发布资源 覆盖 已发布资源,就有这种问题。解决它也好办,就是实现 非覆盖式发布。



看上图,用文件的摘要信息来对资源文件进行重命名,把摘要信息放到资源文件发布路径中,这样,内容有修改的资源就变成了一个新的文件发布到线上,不会覆盖已有的资源文件。上线过程中,先全量部署静态资源,再灰度部署页面,整个问题就比较完美的解决了。

所以,大公司的静态资源优化方案,基本上要实现这么几个东西:

1.配置超长时间的本地缓存 —— 节省带宽,提高性能

2.采用内容摘要作为缓存更新依据 —— 精确的缓存控制

3.静态资源CDN部署 —— 优化网络请求

4.更资源发布路径实现非覆盖式发布 —— 平滑升级

全套做下来,就是相对比较完整的静态资源缓存控制方案了,而且,还要注意的是,静态资源的缓存控制要求在 前端所有静态资源加载的位置都要做这样的处理 。是的,所有!什么js、css自不必说,还要包括js、css文件中引用的资源路径,由于涉及到摘要信息,引用资源的摘要信息也会引起引用文件本身的内容改变,从而形成级联的摘要变化,大概示意图就是:





这是一个非常有趣的 非主流前端领域,这个领域要探索的是如何用工程手段解决前端开发和部署优化的综合问题,入行到现在一直在学习和实践中。

在我的印象中,facebook是这个领域的鼻祖,有兴趣、有梯子的同学可以去看看facebook的页面源代码,体会一下什么叫工程化。

接下来,我想从原理展开讲述,多图,较长,希望能有耐心看完。



让我们返璞归真,从原始的前端开发讲起。上图是一个“可爱”的index.html页面和它的样式文件a.css,用文本编辑器写代码,无需编译,本地预览,确认OK,丢到服务器,等待用户访问。前端就是这么简单,好好玩啊,门槛好低啊,分分钟学会有木有!




然后我们访问页面,看到效果,再查看一下网络请求,200!不错,太™完美了!那么,研发完成。。。。了么?

等等,这还没完呢!对于大公司来说,那些变态的访问量和性能指标,将会让前端一点也不“好玩”。

看看那个a.css的请求吧,如果每次用户访问页面都要加载,是不是很影响性能,很浪费带宽啊,我们希望最好这样:




利用304,让浏览器使用本地缓存。但,这样也就够了吗?不成!304叫协商缓存,这玩意还是要和服务器通信一次,我们的优化级别是变态级,所以必须彻底灭掉这个请求,变成这样:




强制浏览器使用本地缓存(cache-control/expires),不要和服务器通信。好了,请求方面的优化已经达到变态级别,那问题来了:你都不让浏览器发资源请求了,这缓存咋更新?

很好,相信有人想到了办法:通过更新页面中引用的资源路径,让浏览器主动放弃缓存,加载新资源。好像这样:




下次上线,把链接地址改成新的版本,就更新资源了不是。OK,问题解决了么?!当然没有!大公司的变态又来了,思考这种情况:




页面引用了3个css,而某次上线只改了其中的a.css,如果所有链接都更新版本,就会导致b.css,c.css的缓存也失效,那岂不是又有浪费了?!

重新开启变态模式,我们不难发现,要解决这种问题,必须让url的修改与文件内容关联,也就是说,只有文件内容变化,才会导致相应url的变更,从而实现文件级别的精确缓存控制。

什么东西与文件内容相关呢?我们会很自然的联想到利用 数据摘要要算法 对文件求摘要信息,摘要信息与文件内容一一对应,就有了一种可以精确到单个文件粒度的缓存控制依据了。好了,我们把url改成带摘要信息的:




这回再有文件修改,就只更新那个文件对应的url了,想到这里貌似很完美了。你觉得这就够了么?大公司告诉你:图样图森破!

唉~~~~,让我喘口气

现代互联网企业,为了进一步提升网站性能,会把静态资源和动态网页分集群部署,静态资源会被部署到CDN节点上,网页中引用的资源也会变成对应的部署路径:




好了,当我要更新静态资源的时候,同时也会更新html中的引用吧,就好像这样:




这次发布,同时改了页面结构和样式,也更新了静态资源对应的url地址,现在要发布代码上线,亲爱的前端研发同学,你来告诉我,咱们是先上线页面,还是先上线静态资源?

先部署页面,再部署资源:在二者部署的时间间隔内,如果有用户访问页面,就会在新的页面结构中加载旧的资源,并且把这个旧版本的资源当做新版本缓存起来,其结果就是:用户访问到了一个样式错乱的页面,除非手动刷新,否则在资源缓存过期之前,页面会一直执行错误。

先部署资源,再部署页面:在部署时间间隔之内,有旧版本资源本地缓存的用户访问网站,由于请求的页面是旧版本的,资源引用没有改变,浏览器将直接使用本地缓存,这种情况下页面展现正常;但没有本地缓存或者缓存过期的用户访问网站,就会出现旧版本页面加载新版本资源的情况,导致页面执行错误,但当页面完成部署,这部分用户再次访问页面又会恢复正常了。

好的,上面一坨分析想说的就是:先部署谁都不成!都会导致部署过程中发生页面错乱的问题。所以,访问量不大的项目,可以让研发同学苦逼一把,等到半夜偷偷上线,先上静态资源,再部署页面,看起来问题少一些。

但是,大公司超变态,没有这样的“绝对低峰期”,只有“相对低峰期”。So,为了稳定的服务,还得继续追求极致啊!

这个奇葩问题,起源于资源的 覆盖式发布,用 待发布资源 覆盖 已发布资源,就有这种问题。解决它也好办,就是实现 非覆盖式发布。



看上图,用文件的摘要信息来对资源文件进行重命名,把摘要信息放到资源文件发布路径中,这样,内容有修改的资源就变成了一个新的文件发布到线上,不会覆盖已有的资源文件。上线过程中,先全量部署静态资源,再灰度部署页面,整个问题就比较完美的解决了。

所以,大公司的静态资源优化方案,基本上要实现这么几个东西:

1.配置超长时间的本地缓存 —— 节省带宽,提高性能

2.采用内容摘要作为缓存更新依据 —— 精确的缓存控制

3.静态资源CDN部署 —— 优化网络请求

4.更资源发布路径实现非覆盖式发布 —— 平滑升级

全套做下来,就是相对比较完整的静态资源缓存控制方案了,而且,还要注意的是,静态资源的缓存控制要求在 前端所有静态资源加载的位置都要做这样的处理 。是的,所有!什么js、css自不必说,还要包括js、css文件中引用的资源路径,由于涉及到摘要信息,引用资源的摘要信息也会引起引用文件本身的内容改变,从而形成级联的摘要变化,大概示意图就是:



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325963214&siteId=291194637