Hystrix, an anti-avalanche weapon

Hystrix is ​​a fault-tolerant component. This article will help you understand how to use it from its function, fuse design, workflow and application.


1. What is the catastrophic avalanche effect

To talk about Hystrix, we have to talk about a scenario. In the microservice architecture, if the underlying service fails, the service cannot respond or the response is delayed, the caller's waiting time will become longer, and the performance of the entire system will decline. At this time If there is a large influx of requests, the resources of the container will be consumed, causing all services to be paralyzed. This is the catastrophic avalanche effect. This article mainly talks about using Hystrix for service fusing to solve the avalanche problem.

2. What is Hystrix

Hystrix is ​​a delay and fault-tolerant library open sourced by Netflix. It is used to isolate access to remote systems, services and third-party libraries, prevent cascading failures, and improve system availability and fault tolerance.

3. The role of Hystrix

(1) Downgrade

When the service load is too high, or there is a failure, abnormal program operation, timeout, service fuse triggers service degradation, thread pool/semaphore is full, etc., the service can be downgraded and the specified backing data can be returned to improve user experience.

(2) fuse

When the request failure rate reaches the specified threshold, the service will be automatically downgraded, and the caller will not access the provider within the specified time, and directly return the pocket data, thereby avoiding the waste of resources by the program's continuous attempts that may fail. Hystrix provides support for fast failure and fast recovery.

(3) isolation

Isolation is divided into thread pool isolation and semaphore isolation. The thread pool isolates the thread pool that allocates requests to different resources, and lets the thread pool create threads to call services to perform tasks. The specific number of threads is limited by the thread pool, and using semaphores, the real worker threads are created by ourselves , make a limit on the number of semaphores when performing tasks.

(4) Current limiting

The current limiting mechanism is mainly to set the highest QPS threshold for each type of request in advance. If it is higher than the set threshold, the request will be returned directly, and subsequent resources will not be called.

(5) Operation and maintenance monitoring

Hystrix can monitor operational metrics and configuration changes in near real-time to quickly identify problems.

4. Hystrix's fuse design

(1) Fuse request judgment mechanism algorithm: use lock-free circular queue counting, each fuse maintains 10 buckets by default, one bucket every 1 second, each blucket records the success, failure, timeout, and rejection status of the request, and the default error More than 50% and more than 20 requests within 10 seconds are interrupted and intercepted.

(2) Fuse recovery: For fusing requests, some requests are allowed to pass every 5s, and if the requests are all healthy (RT<250ms), the health of the request will be restored.

(3) Fuse alarm: log the request for fusing, and alarm if the abnormal request exceeds certain settings.

5. Hystrix workflow

When an error occurs in the call, open a time window (10 seconds by default) to count whether the number of calls reaches the minimum number of requests

      • No, even if all requests fail, statistics will be reset and a new time window will be opened
      • If yes, calculate the percentage of failed requests to all requests, and determine whether the threshold is reached
          • If it is reached, it will be short-circuited. At this time, an active window will be opened (default 5 seconds). Every 5 seconds, Hystrix will let a request pass. If the call is successful, reset the circuit breaker and start again, otherwise continue to short-circuit
          • If not, reset the circuit breaker and start over

The flow chart is as follows:

6. Application of Hystrix

6.1 Service consumers

(1) Add pom dependencies

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

(2) Add @EnableHystrix annotation to the startup class

@EnableHystrix
@SpringBootApplication
public class TestConsumerApplication {
    public static void main(String[] args) {
        SpringApplication.run(TestConsumerApplication.class, args);
    }

}

(3) Add @HystrixCommand

@RestController
public class HelloController {
    @Reference(version="1.0.0")
    private HelloService helloService;

    @RequestMapping("/hello")
    @HystrixCommand(fallbackMethod = "helloFallback", commandProperties = {
            @HystrixProperty(name = HystrixPropertiesManager.EXECUTION_ISOLATION_THREAD_TIMEOUT_IN_MILLISECONDS, value = "3000"),
            @HystrixProperty(name = HystrixPropertiesManager.CIRCUIT_BREAKER_REQUEST_VOLUME_THRESHOLD, value = "5"),
            @HystrixProperty(name = HystrixPropertiesManager.CIRCUIT_BREAKER_SLEEP_WINDOW_IN_MILLISECONDS, value = "6"),
            @HystrixProperty(name = HystrixPropertiesManager.CIRCUIT_BREAKER_ERROR_THRESHOLD_PERCENTAGE, value = "45")
    })
    public String hello() {
        return helloService.getHello();
    }

    public String helloFallback() {
        return "fallback,hello";
    }
}

Note: The input parameters and return values ​​corresponding to the fallbackMethod method are consistent with the original method.

(4) Annotate the name value of @HystrixProperty

  • circuitBreaker.enabled: Whether to enable the circuit breaker or not, it is enabled by default
  • circuitBreaker.requestVolumeThreshold: When the number of failed requests reaches the set value within the configured time window, the circuit breaker strategy will be triggered. By default, 20 failed requests in 10s trigger a circuit breaker.
  • execution.isolation.thread.timeoutInMilliseconds: time window length, the default is 10s, which is the time unit of circuitBreaker.requestVolumeThreshold
  • circuitBreaker.sleepWindowInMilliseconds: How long does it take to start trying to recover after the circuit breaker, the default is 5s, and the fallback method is called directly within this 5s without requesting remote services.
  • circuitBreaker.errorThresholdPercentage: Within a certain time window, if the error request reaches the set percentage value, the circuit breaker strategy will be triggered. The default is 50%.

6.2 Service Provider

(1) Add pom dependencies

<dependency>
  <groupId>org.springframework.cloud</groupId>
  <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

(2) Add @EnableHystrix annotation to the startup class

@EnableHystrix
@SpringBootApplication
public class TestApplication {

    public static void main(String[] args) {
        SpringApplication.run(TestApplication.class, args);
    }

}

(3) Add @HystrixCommand to the method

@Service(version = "1.0.0", interfaceClass = HelloService.class)
public class HelloServiceImpl implements HelloService{

    @HystrixCommand
    @Override
    public String getHello() {
        return "provider, hello";
    }
}

Guess you like

Origin blog.csdn.net/weixin_43805705/article/details/131134660