web cache

web cache

A web cache is an HTTP device that automatically saves copies of common documents. When a web request arrives in the cache, the document can be fetched from the local storage device instead of the origin server if there is a locally cached copy.

1. Why do we need caching?

  • redundant data transmission

When many clients visit a popular origin server page, the server will transmit the same document multiple times, and each time it is transmitted to a client, some of the same bytes will be transmitted over and over the network. These redundant data transfers can use up expensive network bandwidth. By caching we can keep a copy of the response from the first server, and subsequent requests can be handled by the cached copy

  • bandwidth bottleneck

Caching can also alleviate network bottlenecks. Many networks will provide more bandwidth to local network clients than to remote servers. The client will access the server at the slowest internet speed on the path. If the client gets a copy from a cache on a fast local area network, the cache can improve performance - especially when transferring larger files

  • Instant congestion

The cache is very important when it is congested at the moment of breaking the loop. Emergencies (such as breaking news, batch E-mail announcements, or a celebrity event) are instantaneous congestion when many people visit a web document at about the same time. The resulting excessive traffic spikes can cause catastrophic network and web server crashes

  • distance delay

Even if bandwidth is not an issue, distance can be an issue. Every network router adds latency to Internet traffic, and even if there aren't many routers between the client and the server, the speed of light itself can cause latency. Placing the cache in a nearby computer room can reduce file transfer distances from thousands of miles to tens of meters

2. Cache hits and misses

But the cache cannot hold a copy of every document in the world, so there are two cases:

  • Some requests that arrive in the cache can be serviced with an existing copy, this is called a cache hit

 

 

  • Other requests that arrive in the cache may be forwarded to the origin server because no copy is available, which is called a cache miss

 

 

3. Freshness detection rules

HTTP keeps a copy of the server's document through the cache for a period of time. During this time, the document is considered fresh, and the cache can serve the document directly without contacting the server. We call it a strong cache hit , at which point the browser will return a 200 status code (from cache)

 

 

However, once the cached copy stays for too long and exceeds the freshness limit of the document, the document is considered to be expired.

 

 

Before providing the document again, the cache needs to re-authenticate with the server again to check whether the document has changed. we call it a negotiation cache

 

 

  • On validation hit: if the server object has not been modified, the server will send a small HTTP 304 Not Modeified response to the client

  • Revalidation miss: if the server object is different from the cached copy, the server sends a normal HTTP 200 ok response with the full content to the client

  • Object deleted: If the server object has been deleted, the server returns a 404 Not Found response and the cache deletes its copy

4. The principle of strong cache

HTTP lets origin servers attach an expiration date to each document through special HTTP Cache-Controlheaders and headers that specify how long the content can be considered fresh.Expries

When the browser sends a request for the same resource for the second time, it compares the expiration time with the current time. If it is before the expiration date, the strong cache hits. If the cached document expires, the cache must check with the server to ask whether the document has expired. If modified, get a fresh (with new expiration date) copy

4.1 Strong cache header
  • Cache-Control: max-age:

max-ageThe value defines the maximum lifetime of the document - the maximum legal lifetime (in seconds) from the first time the document is generated until the document is no longer fresh and unusable

  • Expires:

Specify an absolute expiration date. If the expiration date has passed, it means that the document is no longer fresh, but since we can change the client's time, we can change the result of the cache hit. Therefore we prefer to useCache-Control

Cache-Controlinstruction:

  • no-cacheand no-store:

no-cacheIndicates that the returned response must be confirmed with the server before it can be used to satisfy subsequent requests for the consent URL. So if a suitable verification token ( ETag) is present, no-cachea round trip is initiated to verify the cached response, but the download is avoided if the resource has not changed

no-storeIndicates that the browser and all intermediate caches are directly prohibited from storing any version of the returned response, for example, a response containing personal privacy data or banking data. Every time the user requests the asset, a request is sent to the server, and the full response is downloaded

  • publicwith private:

publicWhen the re-response header is present, the response can be cached even if it has an associated HTTP authentication, even if the response status code code is usually not cacheable. In most cases, this publicis not necessary, since explicit cache information (for example max-age) already indicates that the response is cacheable

By contrast, browsers can cache privateresponses. However these responses are usually only cached for a single user, so no intermediate caches are allowed to cache them, for example, the user's browser can cache HTML pages containing the user's private information, but the CDN cannot.

  • max-age:

The directive specifies the maximum time allowed for the fetched response to be reused, starting from the time of the request. e.g. max-age=60means the response can be cached and reused for the next 60s

  • must-revalidate:

must-revalidateTell the cache that it cannot provide a stale copy of the object without prior re-authentication with the origin server, and the cache can still provide a fresh copy at will. If the must-revalidateorigin server is unavailable while the cache is performing a freshness check, the cache MUST return a 504 error

Best Cache-Controlstrategy:

 

 

5. Negotiation cache principle

Just because the cache has expired doesn't mean it's actually different from the document that is currently active on the origin server, it just means that it's time to check, this situation is called negotiating the cache , meaning the cache needs to ask Has the origin server changed?

  • If the revalidation shows that the content has changed, the cache gets a new copy of the document, stores it in the same place as the old document, and sends the document to the client.

  • If the revalidation content has not changed, the cache only needs to obtain a new header, including a new expiration date, update the header in the cache, and update the header in the cache.

5.1 Revalidation with Conditional Methods

HTTP's conditional methods can efficiently implement revalidation. HTTP allows the cache to send a conditional GET to the origin server, requesting the server to send back the object body only if the document differs from the existing copy in the cache, for the cache to verify the 2 most useful headers

  • If-Modified-Since: <date>:

If the document has been modified since the specified date, execute the requested method. Can be used in conjunction with Last-Modfiedserver response headers to fetch content only if the content is modified and different from the cached version

  • If-None-Match:<tags>:

ETagInstead of matching the document with the last modified date, the server can give the document special tags ( ), which are like serial numbers. If the cached tag differs from the tag in the server document, the If-None-Matchheader will execute the requested method

5.2 If-Modified-Since: / Last-Modified

The specific process is as follows:

  1. The first time the client makes a request to the server, the server Last-Modifiedappends the last modification date ( ) to the provided document

  2. When the resource time is requested again, if the strong cache is not hit, the validation will include a If-Modifed-Sinceheader with the date when the cached copy was last modified:If-Modified-Since: <cached last-modified data>

  3. If the content has been modified, the server sends back a new document with a 200 status code and the latest modification date

  4. If the content has not been modified, a 304 Not Modifiedresponse will be returned

5.3 If-None-Match / ETag

In some cases revalidation with the last modified date is not sufficient

  • Some documents may be rewritten periodically (for example: from a background process), but in fact the data contained is often the same, although the content does not change, but the modification date will change

  • Some documents may have been modified, but the modification is not important. There is no need to have the world-wide cache reload the data (such as filling in comments)

  • Some servers cannot accurately determine the date their pages were last modified

  • For some servers that serve documents that change at millisecond intervals (e.g., real-time monitors), a one-second granularity modification date may not be sufficient for these servers

ETagSo HTTP allows users to compare version identifiers ( ) called entity tags . An entity tag is an arbitrary tag (quote string) attached to a document, and a random token generated and returned by the server is usually a hash or other fingerprint of the content of the document. The client doesn't need how the fingerprint was generated, it just sends it to the server on the next request. If the fingerprint is still the same, the resource has not changed and you can skip the download.

 

 

In the above example, the client automatically provides the ETag token within the "If-None-Match" HTTP request header. The server checks the token against the current resource. If it has not changed, the server will return a 304 Not Modified response, telling the browser that the response in the cache has not changed and can be extended for another 120 seconds. Note that you don't have to download the response again, which saves time and bandwidth.

Update and obsolete responses

All HTTP requests made by the browser are first routed to the browser cache, which has been confirmed to cache a valid response for the request. If there is a matching response, the response is read from the cache, which avoids network delays and traffic charges for delivery

But what if we update or discard the cached response, for example we have a css stylesheet cached for up to 24 hours, but we need to update it immediately, how can we notify all visitors of the obsolete cached copy of CSS to update their cache. It can't be done without changing the resource URL.

So, how can you achieve client-side caching and fast updates, you can change its URL when the content of the resource changes, forcing the user to download a new response. Usually, this can be achieved by embedding the file's fingerprint or version number in the filename.

 

 

  • HTML is marked no-cacheup, which means that the browser always revalidates the document on every request, and gets the latest version when the content changes. In addition, fingerprints are embedded in HTML tags, CSS and javascript, if the content of these files changes, the HTML of the web page will also change, and a new copy of the HTML response will be downloaded

  • Allow browsers and intermediate caches (such as CDNs) to cache CSS, and set CSS to expire after 1 year, because the file's fingerprint is embedded in the filename, and the URL changes when the CSS is updated

  • The JavaScript is also set to expire after 1 year, but marked as privatesuch, perhaps because it contains some user private data that the CDN should not cache.

  • Images are cached without a version or unique fingerprint and set to expire after one day


Author: SGAMER-rain
Link: https://juejin.im/post/5ae081aaf265da0b767d263a


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325027769&siteId=291194637