0501-Hystrix Protection Application - Timeout Mechanism, Introduction to Circuit Breaker Mode

I. Overview

  The Chinese name corresponding to hystrix is ​​"porcupine". The porcupine is covered with thorns and can protect itself from natural enemies. It represents a defense mechanism, which coincides with the function of hystrix itself, so the Netflix team uses the framework Named Hystrix, and used the corresponding cartoon image as the logo.

1.1, the current problems

  

  Now let's assume that the service provider is very slow to respond, so the consumer's request to the provider will be forced to wait until the service returns. In a high load scenario, if nothing is done, this problem is likely to cause all threads processing user requests to be exhausted and unable to respond to further user requests.

1.2. Avalanche effect [cascading effect]

  In the microservice architecture, there are usually multiple service layer calls, and a large number of microservices communicate through the network to support the entire system. There are also a lot of dependencies between microservices. However, no service is 100% available, and the network is often fragile, so it is inevitable that some requests will fail. The failure of basic services leads to cascading failures, which in turn make the entire system unavailable, a phenomenon known as the service avalanche effect. The service avalanche effect describes a process in which the unavailability of service providers leads to the unavailability of service consumers and gradually enlarges the unavailability.

  A acts as a service provider, B is A's service consumer, and C and D are B's service consumers. The unavailability of A causes the unavailability of B, and when the unavailability snowballs to C and D, the avalanche effect is formed.

  

1.3. Solutions

1.3.1. Timeout mechanism

  When requesting other services over the network, a timeout must be set. Under normal circumstances, a remote call usually returns within tens of milliseconds. When the dependent service is unavailable, or because of network problems, the response time will become very long (tens of seconds). Usually, a remote call corresponds to a thread/process. If the response is too slow, the thread/process will not be released. Threads/processes all correspond to system resources. If a large number of threads/processes are not released and accumulate, service resources will be exhausted, resulting in unavailability of senior services. So a timeout must be set for each request.

1.3.2. Circuit breaker mode

  Just imagine, if there is no circuit breaker in the family, the current is overloaded (such as excessive power, short circuit, etc.), and the circuit will not open, the circuit will heat up, or even burn the circuit and catch fire. With the circuit breaker, when the current is overloaded, the circuit will be automatically cut off (tripped), thus protecting the safety of the entire circuit and the home. When the current overload problem is solved, the circuit can work again as long as the circuit breaker will be closed.

  In the same way, when the dependent services have a large number of timeouts, it does not make much sense to allow new requests to access them, and it will only consume existing resources needlessly. For example, we set the timeout time to 1 second. If a large number of requests (such as 50) are not responded within 1 second in a short period of time, it often means an exception. At this point there is no need for more requests to access this dependency, we should use circuit breakers to avoid wasting resources.

  A circuit breaker can fail fast. If it detects many similar errors (such as timeouts) within a period of time, it will force multiple subsequent calls to fail fast and no longer request dependent services, thereby preventing the application from constantly It tries to perform operations that might fail so that the application can continue without waiting for errors to be fixed, or waste CPU time waiting for long timeouts. Circuit breakers can also enable the application to diagnose whether the error has been fixed, and if so, the application will try to invoke the operation again.

  The circuit breaker pattern is like a proxy for operations that are prone to errors. Such a proxy can record the number of times an error occurred on the most recent call, and then decide to continue with the allow operation, or return the error immediately.

  

Considerations for implementing circuit breakers:

  1. Monitor the total number of requests and how many failures. Suppose the failure rate is down to 10%. The circuit breaker is opened
  2. The state of the circuit breaker
  3. Shunt
  4. Self-healing (switching of the circuit breaker state)

  

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324610397&siteId=291194637