cache avalanche problem

  1. What is a cache avalanche

      From the following figure, it is clear what is cache avalanche:

      1. Since the Cache layer carries a large number of requests and effectively protects the Storage layer (usually it is considered that this layer has a slightly weaker pressure resistance), the amount of storage calls is actually very low, so it is very cool.laughing out loud

      2. However, if the Cache layer crashes due to some reasons (downtime, cache service hangs or does not respond), it means that all requests will reach the Storage layer, and the call volume of all Storage will increase sharply. So it can't hold up a bit, and even hangs up cry

 

The avalanche problem is called in foreign countries: stampeding herd (fleeing bison), which means that after the cache crash, the traffic will hit the back end like a flight bison.

 

      

    2. The hazards of cache avalanches

       

           The harm of avalanches is obvious. Generally speaking, storage may not be able to handle a large number of requests a long time ago, so a cache layer is added, so avalanches will make storage stressful or even hang.   

 

    3. How to prevent cache avalanches

   

    1. Ensure high availability of the Cache service:

        Just like an aircraft has multiple engines, if our cache is also highly available, even if individual instances fail, the impact will not be great (master-slave switching or some traffic may go to the backend), enabling automated operation and maintenance. E.g:

 

     Consistent hash of memcache:

     

     Sentinel and cluster mechanism of redis:

     

     

    

    

   Regarding the high availability solutions of memcache and redis, there will be articles introduced later.

 

  2. Dependency isolation components limit current for the backend:

      In fact, whether it is cache or mysql, hbase, or even other people's APIs, there will be problems. We can regard these as resources. As a system with a large amount of concurrency, if a resource is inaccessible, even if a timeout is set, All threads will still be hung, making other resources and interfaces inaccessible.

      I believe that everyone must have encountered such pages: these should be Taobao's downgrade strategy.

       
       

       Downgrading is very normal in a high-concurrency system: for example, in the recommendation service, many of them are personalized requirements. If the personalized requirements cannot provide services, you can downgrade and supplement the hot data, so as not to cause the front-end page to be a big blank (open skylight)

       In actual projects, we isolate important resources, such as hbase, elasticsearch, zookeeper, redis, other people's api (may be http, rpc), so that each resource runs in its own thread pool, even if the resource Something went wrong and has no impact on other services.

       However, how to manage the thread pool, such as how to close the resource pool, open the resource pool, and manage the threshold of the resource pool, is quite troublesome. Fortunately, netfilx provides a very powerful tool: hystrix, which can do various resources. Thread pool isolation.

        For a detailed introduction to hystrix, please refer to: http://hot66hot.iteye.com/blog/2155036

        hystrix attached picture:

       

3. Rehearse ahead of time:

   Before the project is launched, through the drill, observe the load of the overall system and storage after the cache crash, and make a plan in advance.  

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325252637&siteId=291194637