Distributed Application Avalanche Utility

Distributed Application Avalanche Utility

 

 

Understanding Avalanche Utility

 

       Service C depends on service B, and service B depends on service A. When service A hangs up, service B's request waits until it times out, causing service B's resources to be exhausted.

 

 

 

 

Reasons for Avalanche Utility

 

  • service provider unavailable
  • Retry to increase traffic
  • service caller unavailable

 

 

Reasons for Service Provider Unavailability

 

  • Hardware failure: server down or network failure
  • program bug
  • Cache breakdown: generally occurs when the cached application restarts, when all caches are emptied, and when a large number of caches are invalidated in a short period of time. A large number of cache misses cause requests to hit the backend directly, causing the service provider to be overloaded and causing the service to be unavailable.
  • A large number of user requests: Before the start of the flash sale and the big promotion, if the preparation is not sufficient, the user initiates a large number of requests, which will also cause the service provider to become unavailable.

 

Reasons for retrying to increase traffic

 

  • User retry: After the service provider is unavailable, the user constantly refreshes the page or even submits the form because he can't stand the long wait on the interface.
  • Code logic retry: This is the retry done by the program itself

 

The main reason why the service caller is unavailable

 

Synchronous wait causes resource exhaustion:

 

When the service caller uses "synchronous call", a large number of waiting threads will be generated to occupy system resources. Once the thread resources are exhausted, the service provided by the service caller will also be in an unavailable state, so the service avalanche effect occurs.

 

 

 

 

solution

 

  • Timeout mechanism: set the timeout time
  • Service current limit
  • Prevent cache breakdown
  • Service fuse: When service B detects that the interface of service A is unstable (for example: 10 errors 5 times), then disconnects the access of service A
  • Service downgrade: Service B prepares a fallback mechanism, an alternate interface
  • Change synchronous wait to asynchronous: prevent exhaustion from taking up resources for a long time
  • Automatic server expansion: This is best to use cloud servers (AWS's auto scaling, Azure), which can have a lot of redundancy, and can also achieve automatic expansion

 

The solution to the current limit

 

  • Program current limit: semaphore, thread pool + queue
  • Gateway current limiting: Because of the high performance of Nginx, first-line Internet companies currently use Nginx+Lua gateways for traffic control, and OpenResty is becoming more and more popular.
  • User interaction current limit: 1. Use loading animation to improve the user's tolerance and waiting time. 2. Add a forced waiting time mechanism to the submit button.

 

Methods to prevent cache breakdown

 

  • When multiple threads come in to check the cache, because there is no data in the cache at this time
  • Then use setNx() to make a mutex, only one thread can get the lock
  • The thread that gets the lock checks the DB, and then puts the DB data back into the cache
  • 拿不到锁的线程,就先等待一段时间(如:5秒),然后再进行第一步的查询

 

熔断和降级的方法

 

Hystrix:可以实现限流、熔断、降级

 

限流:

配置properties

继承HystrixCommoned

重写run方法中实现要限流的服务请求

 

熔断:

配置properties,满足条件后就会断绝run方法

 

降级:

配置properties

重写fallback方法,在里面写降级时要调用的方法

 

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326162265&siteId=291194637