[Turn] Thoroughly understand the mechanism and principle of HTTP caching

foreword

As an important means of web performance optimization, the Http caching mechanism should be a basic link in the knowledge base for students engaged in web development, and a must-have knowledge and skill for students who are interested in becoming front-end architects.
But for many front-end students, they just know that the browser will cache the requested static files, but it is not very clear why they are cached and how the cache takes effect.
Here, I will try to use simple and clear words to introduce the HTTP caching mechanism systematically, hoping to help you understand the front-end caching correctly.



Before introducing HTTP caching, as a foundation for knowledge, let's briefly introduce HTTP packets

HTTP packets are the data blocks sent and responded to when the browser and the server communicate.
The browser requests data from the server and sends a request (request) message; the server returns data to the browser and returns a response (response) message.
The message information is mainly divided into two parts
1. The header containing the attribute (header) ---------------------------- Additional information (cookie, cache information etc.) cache-related rule information is included in the
header the part you want to transfer



Cache rule parsing

For the convenience of everyone's understanding, we believe that the browser has a cache database for storing cached information.
When the client requests data for the first time, there is no corresponding cached data in the cache database at this time, and the server needs to be requested. After the server returns, the data is stored in the cache database.

 

 

There are various rules for HTTP caching, which are classified according to whether it is necessary to re-initiate a request to the server. I divide them into two categories ( mandatory caching and comparison caching).
Before introducing these two rules in detail, let’s let the sequence diagram Everyone has a simple understanding of these two rules.


When cached data already exists, it is only based on forced cache , and the process of requesting data is as follows


. When cached data already exists, it is only based on comparison cache , and the process of requesting data is as follows

Students who don’t know much about the caching mechanism may ask, based on the comparison cache process, regardless of whether the cache is used, it is necessary to send a request to the server, so why use the cache?
Let's put this question aside for the time being. When we introduce each caching rule in detail later, we will give you the answer.

We can see the difference between the two types of caching rules. If the mandatory cache is valid, it does not need to interact with the server, while the comparison cache needs to interact with the server regardless of whether it is valid or not.
Two types of caching rules can exist at the same time, and the mandatory cache has a higher priority than the comparison cache . That is, when the mandatory cache rule is executed, if the cache is valid, the cache is used directly, and the comparison cache rule is no longer executed.



Force cache

From the above, we know that the forced cache can directly use the cached data if the cached data is not invalid. So how does the browser judge whether the cached data is invalid ?
We know that when there is no cached data, when the browser requests data from the server, the server will return the data and cache rules together, and the cache rule information is included in the response header .

For mandatory caching, there will be two fields in the response header to indicate the invalidation rules ( Expires/Cache-Control ).
Using chrome's developer tools, you can clearly see



Expires
  Expires for network requests when the mandatory caching is in effect. The value of is the expiration time returned by the server, that is, when the next request is made, the request time is less than the expiration time returned by the server, and the cached data is used directly.
However, Expires is an HTTP 1.0 thing, and now the default browsers use HTTP 1.1 by default, so its role is basically ignored.
Another problem is that the expiration time is generated by the server, but the client time may be different from the server time, which will lead to cache hit errors.
So the version of HTTP 1.1 uses Cache-Control instead.

Cache-Control
Cache-Control is the most important rule. Common values ​​are private, public, no-cache, max-age, no-store, and the default is private.
private:              the client can cache
public:               both the client and the proxy server can cache (front-end students, you can think that public and private are the same)
max-age=xxx:    the cached content will expire after xxx seconds
no-cache:           required Validating cached data using a comparison cache (described later)
no-store:            All content will not be cached, forced cache , and comparison cache will not be triggered (for front-end development, the more cache the better, so... Basically it says 886)

Take the Cache in the chestnut

diagram -Control only specifies max-age, so the default is private, and the cache time is 31536000 seconds (365 days).
That is to say, if this data is requested again within 365 days, the data in the cached database will be directly obtained and used directly.


Compare cache

Comparison cache , as the name implies, needs to be compared to determine whether the cache can be used.
When the browser requests data for the first time, the server will return the cache ID and the data to the client, and the client will back them up to the cache database.
When requesting data again, the client sends the backed up cache ID to the server, and the server makes a judgment based on the cache ID. After the judgment is successful, it returns a 304 status code to notify the client that the comparison is successful and the cached data can be used.


First visit:

Revisit:

By comparing the two figures, we can clearly find that when the comparison cache takes effect, the status code is 304, and the packet size and request time are greatly reduced.
The reason is that after comparing the identifiers, the server only returns the header part, and informs the client to use the cache through the status code, and no longer needs to return the main part of the message to the client.

For the comparison cache , the transmission of the cache identity is what we need to understand. It is transmitted between the request header and the response header. There
are two types of identity transmission. Next, we will introduce them separately.


Last-Modified / If-Modified-Since
Last-Modified:
The server tells the browser the last modification time of the resource when responding to the request.

If-Modified-Since:
When requesting the server again, this field informs the server of the last modification time of the resource returned by the server when the server was last requested.
After the server receives the request, it finds that there is a header If-Modified-Since and compares it with the last modification time of the requested resource.
If the last modification time of the resource is greater than If-Modified-Since, it means that the resource has been modified again, and the whole resource content will be responded to and the status code 200 will be returned;
if the last modification time of the resource is less than or equal to If-Modified-Since, it means the resource has no The new modification will respond with HTTP 304, telling the browser to continue to use the saved cache.



Etag / If-None-Match (priority is higher than Last-Modified / If-Modified-Since)
Etag:
When the server responds to the request, it tells the browser the unique identifier of the current resource on the server (the generation rule is determined by the server).


If-None-Match:
When requesting the server again, notify the server of the unique identifier of the cached data of the client segment through this field.
After receiving the request, the server finds that there is a header If-None-Match and compares it with the unique identifier of the requested resource. If it is
different, it means that the resource has been changed. It responds to the content of the entire resource and returns a status code of 200;
the same, indicates that the resource If there is no new modification, it responds with HTTP 304, telling the browser to continue using the saved cache.


Summarize
For forced caching, the server notifies the browser of a cache time. During the cache time, the next request will use the cache directly. If it is not within the time, the cache policy will be compared.
For the comparison cache, the Etag and Last-Modified in the cache information are sent to the server through the request, and the server verifies it. When the 304 status code is returned, the browser directly uses the cache.


Browser first request: When the


browser requests again:


If there is an error in the text, I hope my friends can understand, and I hope you can give me corrections

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326463877&siteId=291194637