Front-end engineering - incremental update and caching during the build process

Table of contents

(1) Know

(2) HTTP caching strategy

(3) Overlay update and incremental update


(1) Know

Reasonable use of cache is a necessary means of web performance optimization. Front-end engineers are mainly exposed to the cache strategy for client browsers. Client cache can be divided into the following two types.

[1] Use local storage, such as LocalStorage, SessionStorage, etc.

[2] Utilize the HTTP caching strategy, which is divided into mandatory caching and negotiation caching.

Among them, the utilization of local storage is an optimization measure at the code architecture level, and does not belong to the service category of the front-end engineering system. HTTP caching requires the cooperation of the server. For example, server software such as Apache and Ngnix can set different HTTP caching policies for resources. Incremental update is currently the cache update solution adopted by most teams. Combined with the HTTP mandatory cache strategy, it can not only ensure that users get the latest resources in the first time, but also reduce the consumption of network resources and improve the execution speed of web applications. The role of the front-end engineering system in this is as follows.

[1] Construct the hash fingerprint of the output file, which is a necessary condition for incremental update.

[2] Build and update the reference URLs of other static resources in the html file.

(2) HTTP caching strategy

The browser's caching of static resources is essentially the caching strategy of the HTTP protocol, which can be divided into mandatory caching and negotiation caching . Both caching strategies will cache resources locally, and the mandatory caching strategy decides whether to use the local cache or request new resources based on the expiration time; while the negotiation cache will send a request every time, and the server will decide whether to use the local cache or the new resource after comparison . Which caching strategy to use is determined by the header (Headers) information of the HTTP protocol.

Expires and max-age

Expires and max-age are the key information for enforcing the caching strategy, both of which respond to the header information. Expires is a feature added by HTTP 1.0. By specifying a clear time point as the expiration time of cached resources, the client will use the locally cached file to respond to the request before this time point, without sending an entity request to the server (in the browser This request can be seen in the debug panel with a status code of 200). The advantage of Expires is that it can reduce the client's HTTP requests during the cache expiration time, which not only saves client processing time and improves the execution speed of Web applications, but also reduces server load and client network resource consumption. A typical Expires header information is as follows:

 Expires has a fatal flaw: the time point it specifies is based on the time of the server, but the client compares the local time with this time point when making expiration judgments. That is to say, if there is a discrepancy between the time of the client and the server, the cache resource controlled by the above Expires will become invalid, and the client will send an entity request to obtain the corresponding resource. This is obviously unreasonable.

Cache-control

In response to this problem, HTTP 1.1 adds Cache-control header information to control caching more precisely. Commonly used Cache-control information is as follows.

 no-cache and no-store: "no-cache" does not prohibit caching, but needs to confirm with the server whether the returned response has changed . If the resource has not changed, a cached copy can be used to avoid downloading. "no-store" is a true prohibition of caching, which prohibits the browser and all intermediate caches from storing any version of the returned response . Each time the user sends a request to the server and downloads the full response.

public and private: "public" means that this response can be cached indefinitely by browsers and intermediate caches. This information is not commonly used. The conventional solution is to use max-age to specify the precise cache time . "private" means that this response can be cached by the user's browser, but no intermediate caches are allowed to cache it. For example, a user's browser can cache HTML web pages that contain the user's private information, but a CDN cannot.

max-age: Specifies the maximum time (in seconds) that the cached copy of this response is valid from the time of the request. For example, "max-age=3600" indicates that the browser will use the local cache of this response within the next 1 hour, and will not send entity requests to the server.

max-age specifies the time span of the cache, not the time point when the cache expires, and will not be affected by the time error between the client and the server. Therefore, compared with Expires, max-age can control the cache more precisely, and has a higher priority than Expires. The cache judgment process under the mandatory cache policy (no-cache and no-store specified in Cache-control) is shown in the figure.

Etag and If-none-match

Etag is a unique identifier in the form of a string assigned by the server to the resource, and is returned to the browser as the response header information. After the browser specifies no-cache or max-age and Expires in Cache-control, it sends the Etag value to the server as the request header information through If-none-match. After receiving the request, the server compares whether the Etag value of the requested resource has changed. If it has not changed, it will return 304 Not Modified, and allocate new Cache-control information according to the established cache strategy; if the resource has changed, it will return the latest Resources and reallocated Etag values. The overall process is shown in the figure.

If the browser is forced to use the negotiated cache policy, it is necessary to set the Cache-control header information to no-cache, so that the expiration time of max-age and Expires will not be judged, so that each resource request will be compared by the server. Negotiation caching is not a "lower-level" strategy than mandatory caching. For some special application scenarios or resources, negotiation caching is better than mandatory caching.

The HTML document in the non-server-side rendering project we discussed earlier, because it is the referrer of all other static resources, must ensure that the resources requested each time are up-to-date. At the same time, in order to facilitate server parsing and ensure the uniqueness of website addresses, hash fingerprints cannot be applied to html files. Only negotiation cache can be used in this scenario.

(3) Overlay update and incremental update

Both overlay updates and incremental updates are based on the premise of enabling browser-enforced caching policies . Incremental update is a front-end static resource update strategy widely used in the industry at present. The common implementation solution is to add hash fingerprints to file names . Overlay update has many defects and has no better solution, so it has been gradually eliminated. Next, we use a specific application scenario to illustrate the difference between the two and the advantages of the incremental update scheme.

Suppose there is a css file and a js file in the project, imported by index.html:

        <head>
          <link rel=“stylesheet” href=“main.home.css”>
        </head>
        <body>
          <script type="text/javascript" src=“main.home.js”>
        </body>

In order to improve the loading performance of the page, we enable the mandatory caching strategy, both main.a.css and main.a.js are cached locally and max-age is set to 30 days. If the project needs to be iterated within the validity period of the cache, in order to ensure that the user can obtain the latest resources at the first time, it is necessary to let the browser abandon the use of the previous cache file and send the entity request to download the latest resource. The implementation of the overriding update strategy is to add request parameters after the URL of the reference resource, such as adding a timestamp parameter:

        <head>
          <link rel=“stylesheet” href=“main.home.css? v=1.0.0”>
        </head>
        <body>
          <script type="text/javascript" src=“main.home.js? v=1.0.0”>
        </body>

The browser will regard URLs with different parameters as brand new URLs, so the above changes can ensure that the browser requests and downloads the latest resources from the server . But here comes the problem. In order to make better use of the cache, we should only update the changed resources, and the unmodified resources continue to use the cache. Assuming that we only changed main.home.js and main.home.css remains unchanged, only the URL of main.home.js should be updated, as follows:

 <head>
   <link rel=“stylesheet” href=“main.home.css? v=1.0.0”>
 </head>
 <body>
   <script type="text/javascript" src=“main.home.js? v=1.0.1”>
 </body>

Targeted parameter modification is not difficult for the developers, since the people involved in the development know which files have been changed and which files have not. But this manual operation is very cumbersome manual work, and human error cannot be avoided. So a better way is to use tools instead of labor. But the tool has no memory. If you want the tool to recognize the changed files and modify the parameters in a targeted manner, you must either tell the tool the list of changed files, or let the tool automatically obtain the contents of the files before the change and compare them one by one. Either way is very time-consuming and labor-intensive.

To solve this problem, let's first think about the meaning of the v parameter after the static resource URL. Its only function is to let the browser update resources. If the value of this parameter can correspond to the content of the file one by one, can targeted modification be realized? This is the hash fingerprint: calculate the hash value of the file through the established data digest algorithm (md5 is currently widely used). The usage of hash fingerprint as url parameter is as follows:

     <head>
       <link rel="stylesheet" href="main.home.css? v=858d5483">
     </head>
     <body>
     <script type="text/javascript" src="main.home.js? v=bbcdaf73">
   </body>

However, the scheme of using the hash fingerprint as the value of the url parameter to implement overlay update has the following two fatal flaws.

First: It is necessary to ensure that the html file and the changed static file are updated synchronously, otherwise resources will be out of sync. If it is a project without server-side rendering, html files will be regarded as static resources and deployed to the same server as other static resources (JS/CSS/pictures, etc.), in this scenario we can ensure the synchronization of all resource updates , are not affected by the overlay update bug. However, this deployment method is not suitable for all projects. For projects that rely on server-side rendering, most teams currently deploy the website's entry HTML and static resources separately. For example, deploy HTML and server code to the server corresponding to the domain name www.app.com, and deploy static resources such as JS/CSS/pictures to the server corresponding to static.app.com. There must be a sequence for the separate deployment of the two resources, which means that there is a certain time difference between the two resources going online . No matter which resource is deployed first, it cannot guarantee the correctness of all users accessing pages within this time difference. Even if this time difference is small, it will affect a considerable user base for a website with a large number of visits like Taobao. This is one of the reasons why many teams always choose to release new versions in the middle of the night or early morning when there is less traffic on the website.

Second: It is not conducive to version rollback. Since the resources after each iteration of the overlay update will overwrite the original old version files on the server, this is very unfriendly to the version rollback operation. The operation and maintenance personnel either rely on the caching mechanism of the server itself, or get the old version of the file and overwrite the deployment again.

The incremental update strategy perfectly solves the above defects. The implementation solution is very simple. The hash fingerprint originally used as the parameter value is used as a part of the resource file name and the url parameter used for update is deleted. For example, after the code mentioned above is changed to an incremental update strategy, the form is as follows :

  <head>
    <link rel=“stylesheet” href=“main.home.858d5483.css”>
  </head>
  <body>
    <script type="text/javascript" src=“main.home.bbcdaf73.js”>
  </body>

On the premise that the incremental update strategy is used for static resources, static resources can be deployed before dynamic HTML. At this time, static resources have no reference entry and will not affect the online environment ; dynamic HTML can be accessed immediately after deployment The latest stored static resources. This solves the problem of overlay update deployment synchronization. In addition, the incremental update modifies the name of the resource file, and will not overwrite the existing old version of the file. When performing a rollback operation, the operation and maintenance personnel only need to roll back the HTML. This not only optimizes version control, but also supports the coexistence of multiple versions.

On-demand loading and incremental updates in multi-module architecture scenarios

The multi-module architecture refers to the existence of multiple non-interfering module systems. These module systems may exist in the same page or in two independent pages. For on-demand loading requirements and incremental updates in multi-module architecture scenarios, the following issues need to be considered.

[1] The impact of the modification of the synchronization module on the hash fingerprint of the asynchronous file and the main file.

[2] The impact of the modification of the asynchronous module on the hash fingerprint of the main file.

The impact of the modification of the synchronization module on the hash fingerprint of the asynchronous file and the main file

Suppose the module structure of a single-page project is as shown in the figure

[1] The main module main.app.js.

【2】The synchronization module module.sync.js is built and merged with the main module into the main file main.app.[hash].js, which is loaded synchronously.

[3] The asynchronous module module.async.js is built separately as an asynchronous file app.async.[hash].js, which is loaded on demand.

The [hash] value of the build output file is calculated through md5, and changes to the content of the modules involved in the calculation will inevitably affect the calculated results. The content of the synchronization module module.async.js participates in the hash fingerprint calculation of the main file as a calculation factor, but does not participate in the calculation of the hash fingerprint of the asynchronous file. Therefore, it can be determined that the modification of the synchronization module affects the hash fingerprint of the main module, but has no effect on the asynchronous file.

The impact of the modification of the asynchronous module on the hash fingerprint of the main module

The content of the async module only affects the hash fingerprint of the async file, is that right? Before answering this question, let's figure out how asynchronous files are loaded. The following code is a common logic for loading asynchronous files:

    window.onload = function(){
      var script = document.createElement('script');
      sciprt.src = 'https://static.app.com/async.js'; //异步文件URL
      document.head.append(script);
    };

The URL of the asynchronous file is hard-coded in the main file responsible for loading it. If the hash fingerprint is applied, the content of the above code after construction is as follows:

   window.onload = function(){
     var script = document.createElement('script');
     sciprt.src = 'https://static.app.com/async.2483fae1.js'; //异步文件URL
     document.head.append(script);
   };

Assuming that the main file is main.home.bbcdaf73.js at this time, all resources of the current version are cached locally on the user. After the iteration of the new version, only the content of the asynchronous module is changed. After the construction, the hash fingerprint of the asynchronous file is updated to async.6203b33c.js. Does the hash fingerprint of the main file change?

We first assume that the hash fingerprint of the main file has not changed. After the release of the new version, the reference URL of the main file in the HTML document has not changed, and the browser still uses the cached copy of main.home.bbcdaf73.js. The URL of the asynchronous file in the main file is still "https://static.app.com/async.2483fae1.js", that is to say, even if we update the hash fingerprint of the asynchronous file, it does not make the browser request the latest resources, which is obviously unreasonable. Therefore, the modification of the asynchronous module not only affects the hash fingerprint of the corresponding asynchronous file, but also the hash fingerprint of the main file must be modified synchronously, so as to ensure that the user gets the latest asynchronous file.

Guess you like

Origin blog.csdn.net/Octopus21/article/details/127828283