The advantages and disadvantages of caching

Turn: https: //www.cnblogs.com/bethunebtj/p/9159914.html

1. Why do you need a cache?

At high concurrent requests, why we frequently mention caching? The most direct reason is that the current disk and network IO IO hundreds of times with respect to the performance disadvantage of IO memory.
Make a simple calculation, if we need certain data, the data is read out from the database disk needs 0.1s, pass over from the switch needs 0.05s, then each request to complete a minimum of 0.15s (Of course, the fact that there is no disk and network IO so slow, here is an example), the database server can only respond to 67 requests per second; and if the data exists in the machine's memory, read out only 10us, it can respond to 100,000 requests per second.

By using the present high-frequency data from the cpu position closer to reduce data transfer time, thereby increasing processing efficiency, this is the meaning of the cache.

2. Where caching?

All places. E.g:

  • We read data from the hard drive when, in fact, the operating system also have additional data in the vicinity of the read memory
  • For example, CPU when reading data from the memory, but also read a lot of additional data to all levels of cache inside
  • Between the respective input and output buffer with a number of stored data send and receive uniform, rather than a byte of a byte processing

This is above the level of the system, the system software design level, a lot of places with the cache:

  • The browser cache elements of the page, so that when repeated access a Web page, you avoid downloading data (such as a large picture) on the Internet from
  • static web services will be deployed on things ahead of CDN, this is a cache
  • Database queries are cached, so the same query a second time is faster than the first
  • Memory database (such as redis) to select the amount of data in memory rather than a hard drive, which can be seen as a large cache, just put up the entire database cache
  • Applications to the results of recent calculations in local memory, if the next incoming request or the original request, it returns the result to skip directly calculated

3. Cache of those pits

Cache is useful, but the cache will be buried with a lot of bad pit:

Cache penetration

Cache penetration mean received a request, but the request is not in the cache, only to query the database, and then into the cache. And there are two risks, while there is a lot of requests to access the same data, then the whole business system to these requests sent to the database; the second is someone maliciously constructed data does not exist on a logical, then send a large request, Thus each request will be sent to the database, the data may lead to hang.

How to deal with this situation? For malicious access, a idea is to do check in advance, to filter out malicious data directly, do not send to the database layer; The second idea is to empty the cache results, is to query data that does not exist yet a record of the data does not exist in the cache years, so the number of queries the database can be effectively reduced.

So non-malicious access it? This should be combined with the breakdown in terms of cache.

Cache breakdown

A data mentioned above is not, then a lot of requests are sent to the database actually be classified as Category cache breakdown: for hot data, the moment when the data fails, all requests are placed in the database request to update the cache, database crushed.

How to prevent this problem? The idea is a global lock that all requests for access to a data share a lock, get that to be eligible to access the database lock, other threads must wait. But now the business is distributed, local control lock can not wait for the other servers, so use the global lock, such as the realization of a global lock by setnx redis.

Another idea is to take the initiative to refresh the data is about to expire, you can have a lot of practice, such as starting a thread polling data, such as all your data into the cache intervals between periodic refresh partition data and so on. This second idea and we are going to talk about caching and avalanche relationship.

Cache avalanche

For example, we avalanche cache refers to all the same data set expiration time, and then at a certain historic moment, the entire cache all the data has expired, then all requests are instantly hit the database, the database will collapse.

Solutions are either divide and conquer, divide a smaller buffer zone, according to the interval expires; either to the expiration time for each key is to add a random value to avoid overdue, to achieve peak load shifting to refresh the cache.

Cache Refresh

When it comes to refresh the cache, it also has pit. For example, a job before me, there is a big event, it was in full swing when, all of a sudden advertising has changed the blank. Later trace the cause, all of the creatives in the cache, then got up program, dedicated to refresh the cache, each time the current material full amount refreshed.

Bad in this whole amount. Because when a large flow of great events, advertising update under great pressure, the program is responsible for providing updated material pressure collapse. Refresh cached program when requested and received a return result Null. Then I loved the refresh program based on the null, empty the entire cache, all creatives ineffective.

In short, you want to do caching highly concurrent systems, it must take into account a variety of corners, the careful design, any small negligence may cause the system to crash.

Guess you like

Origin www.cnblogs.com/jvStarBlog/p/11762766.html