Web caching and HTTP status codes

Web caching and HTTP status codes


1. Web cache:

1. The role of Web caching

  • Reducing network bandwidth consumption
    Whether for website operators or users, bandwidth represents this money. Excessive bandwidth consumption will only make network operators cheaper. When the Web cache copy is used, only minimal network traffic will be generated, which can effectively reduce operating costs.

  • Reduce server pressure
    After setting a valid period for network resources, users can repeatedly use the local cache to reduce requests to the origin server and indirectly reduce the pressure on the server.

  • Reduce network latency and speed up page opening.
    Bandwidth is very important for website operators, but for large Internet companies, it may not care. So does web caching still work? The answer is yes. For users, the use of cache can significantly speed up the page opening speed and achieve a better experience.

2. What are the web caches

  • Database data cache

    The general practice is to add a cache server between the database and the business server, such as redis. The data of the database will be stored on the cache server. If the same request is made again, the data will be returned directly without querying the table again. Reduce the time to check the database and improve efficiency.
  • Proxy server cache

    • Proxy server cache
      The proxy server on the Web refers to the intermediate server between the browser and the origin server.
      A complete proxy request process: the
      client first establishes a connection with the proxy server, and according to the proxy protocol used by the proxy server, requests to establish a connection to the source server or obtain the specified resources of the source server. Proxy server caching refers to caching after obtaining the source server resources. If the resources that the client wants to obtain effectively exist in the proxy server cache and there is no request for resources from the source server, then the proxy server will not act like the source server. Initiate a request, but return the resource directly to the client,
    • CDN cache
      Previously, CDN was understood as a reverse proxy server. "Large Website Architecture" introduced that CDN server and reverse proxy server are different. The CDN server is deployed at the network service provider nearest to the user, so that the user can get it quickly. When it comes to the requested content, this is the principle of proximity that everyone often mentions. Generally speaking, CDN servers are suitable for storing our static files and hot content. The reverse proxy server mentioned here is deployed on the front end. When a request arrives at the data center of the website, the reverse proxy server is the first to be accessed.
  • Browser cache

The browser can cache the information that the server transmits to it, so how does the browser determine whether or not to cache? By what to judge? Under what circumstances access the proxy server and under what circumstances access the source server?

1. HTTP caching
The HTTP caching mechanism is involved here , but HTTP caching involves not only browsers but also proxy caching servers. HTTP caching systems can be roughly divided into cache storage strategies, cache expiration strategies, and cache comparison strategies.

  • Cache storage strategy: used to determine whether the HTTP response content can be stored by the client, and which clients can be stored.
    Here HTTP provides the Cache-Control header field for the HTTP request initiated by the client to determine whether to cache and access the origin server.
    There are multiple values ​​in Cache-Control, such as public, private, no-store, no-cache, no-transform, must-revalidate, proxy-revalidate, max-age, etc. What is the meaning of these values? ?

    • max-age: Indicates the time that can be cached, in seconds. If the comparison time expires or is zero, the proxy server usually forwards the request to the origin server. Here, the cache comparison strategy must be used.
    • public: indicates that the response can also be cached by other users
    • private: the response can only be cached by a specific user
    • no-cache: This is usually equal to max-age=0, and max-age=0 can be split into Cache-Control: public/private; Expires: current time.
    • no-store: implies that the response is confidential, and does not store the request or response locally
    • no-transform:
    • must-revalidate
    • proxy-revalidate
  • Cache expiration strategy: used to determine whether the locally stored data expires before deciding whether to request the server to obtain the data.
    This strategy mainly uses the header field of Expires mentioned above. Expires indicates the effective time of the cached data and tells the client that this The local cache becomes invalid after the time point (compared to the client time point). At this point in time, the client cache is still valid and can be used to load and display.
    But it should be noted that the max-age setting in Cache-Control will take precedence over the setting of Expires.

  • Cache comparison strategy: The data identifier cached on the client is sent to the server, and the server determines whether the cached data is valid through the identifier and decides whether to resend the data.
    The client will often resend the HTTP request to the server after detecting that the data has expired or the browser is refreshed. At this time, the server does not return the data immediately, but judges whether the request header has a conditional request identifier, which will be combined with Last-Modified Together with the ETags field, determine if it is invalid and then make a response decision.

    Conditional request field judgment: Last-modified and Etag
    and cache related header conditional request fields

    1. If-Match

    The client sends the request with conditions. The server matches the corresponding ETag value after receiving it. If it matches, the request is processed, otherwise the status code 412 Precondition Failed is returned.

    1. If-Modified-Since

    The client sends a request to the server, telling the server that if the value of the If-Modified-Since field is earlier than the resource update time, that is, the resource has been updated, it hopes that the server can handle this request. If the resource has not been updated, it returns status code 304 Not Modeified.

    1. If-None-Match

    The effect of this condition is just the opposite of If-Match. If the ETag value does not match, the request will be processed, and if it matches, the status code 304 Not Modeified will be returned.

    1. If-Range

    If the ETag value matches, it proves that the requested file can be found in the server, indicating that the server is expected to process the request, if not, the request will be processed completely and the entire file will be returned.

    1. If-Unmodified-Since

    The effect of this condition is just the opposite of If-Modified-Since. If the resource has not been updated, the request is processed, otherwise the status code 412 Precondition Failed is returned.

    2. Cookie
    HTTP Cookie was originally used to store session information on the client. This standard requires the server to send Set-Cookie HTTP header as a corresponding part of any HTTP request, including session information, but the memory of cookie is relatively small, and the maximum is 4k, so the generally stored information is sessionID and other content. The cookie in the HTTP request initiated by the client is the only source that can be used by the server to verify the client, or the website can identify the user for session tracking and store the data locally ( Usually encrypted).

    • Restrictive
      • Size limit: Most browsers have a length limit of about 4096B (plus or minus 1), but for browser compatibility, it is best to limit it to 4095B. If the maximum size is exceeded, the cookie will be silently discarded.
        The total limit of each domain: Firefox up to 50, Opera up to 30, Safari and Chrome have no hard and fast rules.
    • constitute
      • name: It is case-insensitive, unique and determined, but it is better to be case-sensitive because some services treat cookies in case, and the name must be URL-encoded.
      • Value: The string value stored in the cookie, which must be URL encoded.
      • Domain: For which domain the cookie is valid. If not explicitly set, the domain is from which domain the cookie is set.
      • Path: For the path in the specified domain, cookies should be sent to the server, so no other domains will not be sent cookie information.
      • Expiration time: A timestamp indicating when the cookie should be deleted. By default, when the expiration time is reached, all cookies will be deleted when the browser session ends, but you can also set the deletion time by yourself to specify the exact time that should be deleted Time, so if the cookie does not expire, it will also be saved on the user's machine after the browser is closed.
      • Security flag: After specified, the cookie will only be sent to the server when the SSL connection is used.

    But note: you must not store important and sensitive data in the cookie. The cookie data is not stored in a secure environment. The data contained in it can be accessed by others, which can easily cause Web security issues, so don’t store it in the cookie. Store and inject credit card number or personal address data.

    3、localStorage

    localStorage is a local caching scheme in HTML5, which replaces globalStorage. It is generally used to store the data returned by Ajax to speed up the rendering speed the next time the page is opened. But it should be noted that to access the localStorage page, you must come from the same domain name (subdomain name is invalid), use the same protocol, and be on the same port. In addition, the maximum storage capacity of localStorage is 5M, and the localStorage property allows you to access a local Storage object. localStorage is similar to sessionStorage. The difference is that the data stored in localStorage has no expiration time, while the data stored in sessionStorage will be cleared at the end of the browser session (browsing session), that is, when the browser is closed.

    4.
    The data stored in sessionStorage in sessionStorage will be cleared at the end of the page session, and the maximum data storage capacity is 5M. The page session is maintained while the browser is open, and the original page session will be maintained when the page is reloaded or restored. Opening a page in a new tab or window will initiate a new session, which is different from the operation of session cookies. In addition, sessionStorage is the same as localStorage, the saved data is limited to the protocol of the page.

2. Common status codes:

We don't need to remember all status codes clearly, it's ok to distinguish the types clearly.

1XX: 1XX is not often used, but when it comes to classification, it can be said that the return of 1XX status code indicates that the received request is being processed.

2XX: 2XX means the request is successful, our most common ones are:

  • 200 OK: indicates that the request from the client was processed by the server normally.
  • 204 No Content: indicates that the request received by the server has been successfully processed, but the response message returned does not contain the body of the entity, nor is it allowed to return the body of the entity, which means that the page displayed by the browser is not updated. (Generally only used as a server when only the client is required to send information)
  • 206 Partical Content: When the client initiates a scope request, the server successfully processes the status code returned in the response.

3XX: indicates redirection, here it is usually returned with the Location field of the response header, and almost all browsers will forcibly try to access the redirected resources that have been prompted after receiving a response containing the header Location, so 3XX For redirection status codes, common ones are:

  • 301 Moved Permaently: Permanent redirect, which means that the requested resource has been assigned a new URI.
  • 302 Found: Temporary redirect, indicating that the requested resource has been assigned a new resource, and hope to use a new URI to access this time. This is different from 301. If the requested URI has been saved as a bookmark, 301 will force a change The URI of the bookmark is the value corresponding to Location, but 302 will not be changed. It only means that the URI of this visit has changed and is temporary.
  • 303 See Other: The requested resource has another URI, and a GET request should be used to direct the requested resource. 303 and 302 are also very similar, but 303 clearly indicates that the client should use the GET method to redirect to obtain resources. Here again, when the 301, 302, and 303 status codes are returned, almost all browsers will change the POST to a GET request and delete the body of the request message. After that, the request will be sent again automatically, but the standard of 301 and 302 It is forbidden to change the POST method to the GET method, but in actual use, the browser still does this.
  • 304 Not Modified: This status code indicates that when the client sends an attached request, the server allows the request to access the resource, but the 304 status code returned because the initiated request does not meet the conditions attached to the request (that is, the server's resource has not changed You can directly use the client's unexpired cache), but this 304 has nothing to do with redirection. In addition, the incidental request mentioned here refers to: If-Modified-Since and If-None-Match.
  • 307 Temporary Redirect: Temporary redirect. This status code is very similar to 302, but the difference is that 307 will strictly follow the rules of the browser and will not change the POST method to the GET method.

4XX: Client error, indicating that the client is the cause of the error.

  • 400 Bad Request: indicates that there is a syntax error in the request message. When an error occurs, you need to modify the request content and send the request again. In addition, the browser will treat it like 200 OK. (A bit confused here, check later)
  • 401 Unauthorized: indicates that the sent request needs to have authentication information authenticated by HTTP, which is usually returned with the WWW-Authenticate response header field. If a request has been made before, it means that the user authentication has failed. If the browser receives it for the first time When the 401 response is reached, a dialog window for authentication will pop up.
  • 403 Forbidden: indicates that the access to the requested resource was rejected by the server, and the server does not need to give detailed reasons for rejection (no access authorization, etc.)
  • 404 Not Found:
    This status code indicates that the server cannot find the corresponding requested resource.

5XX: Server error, an error occurred on the server itself

  • 500 Internet Server Error: An error occurred while the server was executing the request. It may also be due to a bug or temporary failure in the Web application.
  • 503 Service Unavailable: The server is temporarily overloaded or is being shut down for maintenance and cannot process the request now. Generally, if you know the approximate time to remove the above situation, you can add a Retry-After field to the response header and return it to the client, telling the client to How soon will the request be initiated again?

references:

Guess you like

Origin blog.csdn.net/vipshop_fin_dev/article/details/108835534