Detailed explanation of Spring Cloud service downgrade and circuit breaker

1. Overview of Hystrix Circuit Breakers

1.1 Problems faced by distributed systems

Applications in a complex distributed architecture have dozens of dependencies, and each dependency will inevitably fail at some point. This could occur causing service avalanches . So what is a service avalanche?

When calling between multiple microservices, suppose that microservice A calls microservice B and microservice C, and microservice B and microservice C call other microservices. This is the so-called " fan-out " (like an opening Folding fan). If the call response time of a microservice on the fan-out link is too long or unavailable, the call to microservice A will occupy more and more system resources, which will cause the system to crash. This is the so-called " avalanche effect" ". That is the system availability has been destroyed.

For high-traffic applications, a single back-end dependency may cause all resources on all servers to saturate within a few seconds. Worse than failure, these applications may also cause increased delays between services, strain on backup queues, threads, and other system resources, resulting in cascading failures that the entire system sends more frequently. These all indicate the need to isolate and manage failures and delays so that the failure of a single dependency cannot cancel the entire application or system.

Therefore, usually when an instance of a module is found to fail, this module still accepts traffic at this time, and then the problematic module also calls other modules, which will cause a cascading failure, or avalanche. In the face of this terrible problem, we should take the service degradation, fusing services , etc. to solve.

1.2 What is Hystrix

Hystrix is ​​an open source library used to deal with the delay and fault tolerance of distributed systems. In distributed systems, many dependencies will inevitably fail to call, such as timeouts, exceptions, etc. Hystrix can guarantee that in the case of a dependency problem, Will not cause the entire service to fail, avoid cascading failures, and improve the resilience of the distributed system.

The "circuit breaker" itself is a kind of switching device. When a service unit fails, the circuit breaker's fault monitoring (similar to a physical blown fuse) returns an alternative response that meets expectations and can be processed to the caller. (FallBack), instead of waiting for a long time or throwing an exception that the caller cannot handle, this ensures that the service caller’s thread will not be occupied for a long time and unnecessary, thus avoiding the failure of the distributed system Spread, and even avalanche.

1.3 Hystrix function

Mainly include service degradation, service fusing, near real-time monitoring, current limiting, isolation, etc., refer to its official documents. Of course, Hystrix has been discontinued now. Although there are some alternatives, it is still very important to learn Hystrix and its ideas!

1.4 Important Concepts of Hystrix

Service degradation (Fall Back) Suppose that the service B to be called by the microservice A is unavailable, and the service B needs to provide a solution to the problem, instead of letting the service A wait there and wait for death. Do not let the client wait and immediately return a friendly prompt, such as the client prompts that the server is busy, please try again later, etc. What conditions will trigger service degradation? For example, abnormal program operation, timeout, service fuse triggering service degradation, full thread pool/semaphore can also cause service degradation.

Services fuse (Break) service fuse is equivalent to physically blown fuse . After the analog fuse reaches the maximum service access, it directly denies access, cuts off the power, and then calls the service degradation method and returns a friendly prompt.

** Service flow limit (Flow Limit) ** Spike high concurrency and other operations, it is strictly forbidden to come over crowded in a swarm, everyone queues up, N per second, and proceed in an orderly manner.

2. Hystrix case

2.1 Service provider 8003 module

Build Module

cloud-provider-hystrix-payment8003

pom.xml

<!--hystrix-->
<dependency>
	<groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

yml configuration file / main startup class

server:
  port: 8003

spring:
  application:
    name: cloud-provider-hystrix-payment

eureka:
  client:
    register-with-eureka: true
    fetchRegistry: true
    service-url:
      # 单机版
      defaultZone: http://localhost:7001/eureka # 入驻的服务注册中心地址
@SpringBootApplication
@EnableEurekaClient
public class PaymentHystrixMain8003 {
    public static void main(String[] args) {
        SpringApplication.run(PaymentHystrixMain8003.class,args);
    }
}

Business class

  • service
@Service
public class PaymentService {
    /**能够正常访问的方法*/
    public String paymentInfo_OK(Integer id) {
        return "线程池: " + Thread.currentThread().getName()
                + " paymentInfo_OK,id: " + id + "\t" + "哈~";
    }

    /**模拟出错的方法*/
    public String paymentInfo_FAIL(Integer id) {
        int timeNumber = 3;
        //暂停几秒钟线程,程序本身没有错误,就是模拟超时
        try {
            TimeUnit.SECONDS.sleep(timeNumber);
        }catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "线程池: " + Thread.currentThread().getName()
                + " paymentInfo_FAIL,id: " + id + "\t" + "哈~ "
                + "耗时" + timeNumber + "s";
    }
}
  • controller
@RestController
@Slf4j
@RequestMapping("/payment/hystrix")
public class PaymentController {
    @Resource
    private PaymentService paymentService;

    @Value("$server.port")
    private String serverPort;

    @GetMapping("ok/{id}")
    public String paymentInfo_OK(@PathVariable("id")Integer id) {
        String result = paymentService.paymentInfo_OK(id);
        log.info("===> result: " + result);
        return result;
    }

    @GetMapping("timeout/{id}")
    public String paymentInfo_TimeOut(@PathVariable("id")Integer id) {
        String result = paymentService.paymentInfo_FAIL(id);
        log.info("===> result: " + result);
        return result;
    }
}

In other words, the cloud-provider-hystrix-payment service provides two methods, the paymentInfo_OK method can be accessed quickly, and the paymentInfo_TimeOut method we simulate a complex business logic, which can be executed in 3 seconds through thread sleep. Service method.

After starting the registration center and 8003 service, we visit the paymentInfo_OK ( replaced with OK below ) and paymentInfo_TimeOut ( replaced with TO below ) of the service respectively, and we find http://localhost:8003/payment/hystrix/ok/1 It can be accessed very quickly, and each visit to http://localhost:8003/payment/hystrix/timeout/1 takes about 3 seconds.

The above is the basic platform, start the demonstration: correct => error => downgrade circuit breaker => correct

2.2 8003 High Concurrency Test

Service provider self-test stress test:

When the complex business logic of 3 seconds is required to access the TO , OK , which takes very little time, can be accessed normally, but in the case of high concurrency, that is to say , when the TO has a lot of access, OK can still be accessed so normally ? Next, we use Jmeter to perform a high-concurrency stress test, and use 20,000 requests to access the TO service.

In Jmeter a new thread group : Hystrix test used to simulate a high-concurrency TO service, thread group configuration parameters are as follows:

Then we use this thread group to send HTTP requests to the TO service, and create the following HTTP requests for stress testing:

We observe the back-end console of the 8001 service, and we can see that a large number of visits to the TO service are made. And what will happen when we visit the OK service at this time? => As you can see, the OK service cannot be accessed as quickly as before. Here we are simulating a traffic volume of 20,000 (we didn’t dare to simulate a digital traffic that was too large, for fear of directly killing the system). In practice, it may be There are much more than 20,000 visits. When the visits are more, the service may even be stuck. The reason is that the default number of working threads of Tomcat is full, and there are no extra threads to break down the pressure and process.

The stress test just done is only a test implemented by the service provider 8001 itself. If an external service consumer 80 accesses the service at this time, then the service consumer can only wait, and the consumer will obviously wait for this. If the time is not satisfied, the service provider is likely to be dragged to death. We found that 8001 self-tests will have problems. What if we use service consumer testing again?

2.3 Service consumer 80 modules for stress testing

Create a new Module: cloud-consumer-feign-hystrix-order80 as the service consumer, the service consumer uses feign to access the provider’s service, and the corresponding service interface is written as follows:

@Component
@FeignClient("CLOUD-PROVIDER-HYSTRIX-PAYMENT")
public interface PaymentHystrixService {

    @GetMapping("/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id);

    @GetMapping("/payment/hystrix/timeout/{id}")
    public String paymentInfo_FAIL(@PathVariable("id") Integer id);
}

Then write its Controller:

@RestController
@Slf4j
public class OrderHystrixController {

    @Resource
    private PaymentHystrixService paymentHystrixService;

    @GetMapping("/consumer/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_OK(id);
        return result;
    }

    @GetMapping("/consumer/payment/hystrix/timeout/{id}")
    public String paymentInfo_TimeOut(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_FAIL(id);
        return result;
    }
}

Start the 80 service, use http://localhost/consumer/payment/hystrix/ok/1 to access the OK service of the service provider , and then perform a stress test. As before, the service cannot be accessed quickly. If the stress test is in progress When the number of threads is more, it is likely to cause a timeout error, and the following error message appears:

Failure reason: 8003 other interface services at the same level are trapped, because the worker threads in the Tomcat thread pool have been squeezed out, and 8003 is called at this time, which will inevitably lead to slow client access response. It is precisely because of this phenomenon that we need technologies such as service degradation, fault tolerance, and service current limiting.

2.4 Problems found

Problems to be solved:

  • Timeout causes the server to slow down (circle) => no longer wait for timeout
  • Error (downtime or program running error) => There must be an error

solution:

  • The other party's service (8003) has timed out, the caller (80) cannot be stuck waiting forever, there must be service degradation
  • The other party's service (8003) is down, the caller (80) cannot be stuck waiting forever, there must be service degradation
  • The other party's service (8003) is OK, the caller (80) fails or has self-requirements (the waiting time is less than the service provider), and handles the degradation by himself

3. Hystrix service downgrade Fall Back

3.1 Service degradation of the server-side service provider

The downgrading configuration is annotated with @HystrixCommand. Find the problem on the service provider itself, set the peak value of the timeout time of its own call, and it can run normally within the peak value. If the peak value is exceeded, a method of handling it is required for service degradation.

First, enable @HystrixCommand on the service provider's business class to implement how to deal with exceptions, that is, once the service method fails and an error message is thrown, the fallbackMethod service degradation method marked by @HystrixCommand will be automatically called. In the service of the service provider, we modify the TO service:

@Service
public class PaymentService {
    /**能够正常访问的方法*/
    public String paymentInfo_OK(Integer id) {
        return "线程池: " + Thread.currentThread().getName()
                + " paymentInfo_OK,id: " + id + "\t" + "哈~";
    }

    /**模拟出错的方法*/
    @HystrixCommand(fallbackMethod = "paymentInfo_FailHandler",
                    commandProperties = {
            @HystrixProperty(
                    name = "execution.isolation.thread.timeoutInMilliseconds",
                    value = "3000")
    })
    public String paymentInfo_FAIL(Integer id) {
        //int age = 10/0; //模拟异常,也会兜底
        int timeNumber = 5;
        
        //暂停几秒钟线程,程序本身没有错误,就是模拟超时
        try {
            TimeUnit.SECONDS.sleep(timeNumber);
        }catch (InterruptedException e) {
            e.printStackTrace();
        }
        return "线程池: " + Thread.currentThread().getName()
                + " paymentInfo_FAIL,id: " + id + "\t" + "哈~ "
                + "耗时" + timeNumber + "s";
    }

    /** 定制服务降级方法*/
    public String paymentInfo_FailHandler(Integer id) {
        return "系统忙,请稍后再试";
    }
}

Then add on the primary startup class @EnableCircuitBreaker comment on the fuse activation, TO service access time of 5 seconds, while peak time we Hystrix configured for three seconds, which is when the service or service timeout errors when we set will visit The fallbackMethod service degradation method, we visit the TO service again, and we find that the method of execution is indeed the service degradation method:

Note: The hot deployment method that we have configured by ourselves has obvious changes to the Java code, but it is recommended to restart the microservice to modify the properties in @HystrixCommand (sometimes it cannot take effect in time)

3.2 Service degradation of the client service consumer

Since the service provider can perform downgrade protection, the consumer of the service can also better protect themselves, or they can downgrade protection, which means that Hystrix service downgrade can be placed on the server (service provider), It can also be placed on the client (service consumer), but! The client is usually used for service degradation. Next, configure your own service degradation protection on the service consumer, that is, the client, modify the configuration file of the 80 consumer, and add the following configuration to support Hystrix:

feign:
  hystrix:
    enabled: true

Add @EnableHystrix to the main startup class of 80 consumers to activate Hystrix service. Then add the @HystrixCommand annotation to the 80 Controller to achieve service degradation:

@RestController
@Slf4j
public class OrderHystrixController {

    @Resource
    private PaymentHystrixService paymentHystrixService;

    @GetMapping("/consumer/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_OK(id);
        return result;
    }


    @GetMapping("/consumer/payment/hystrix/timeout/{id}")
    @HystrixCommand(fallbackMethod = "paymentInfo_FailHandler",
            commandProperties = {
                    @HystrixProperty(
                            name = "execution.isolation.thread.timeoutInMilliseconds",
                            value = "1500")
            })
    public String paymentInfo_TimeOut(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_FAIL(id);
        return result;
    }

    /** 定制服务降级方法*/
    public String paymentInfo_FailHandler(Integer id) {
        return "我是消费者80,系统忙,请稍后再试";
    }
}

In other words, if the consumer accesses the service provider for more than 1.5 seconds, then it will access its own downgraded service method.

3.3 Unified global service degradation method

The current processing method is problematic, that is, each business method corresponds to a service degradation method, which will lead to code expansion, so we should define a unified service degradation method, a unified method and a custom Methods are separated. Moreover, we have mixed the service degradation method and business logic together, which will lead to code confusion and unclear business logic.

For the first question, we can use the @DefaultProperties(defaultFallback = "") annotation in the feign interface to configure the global service downgrade method, which means that you have configured the @HystrixCommand(fallbackMethod = "") fallbackMethod method to use your own configuration The service downgrading method for those that have not been configured will use the global service downgrading method configured by @DefaultProperties(defaultFallback = ""). In this case, the general service degradation method is separated from the exclusive service degradation method, which avoids code expansion and reasonably reduces the amount of code. Modify the Controller of the service consumer 80 into the following:

@RestController
@Slf4j
//没有特别指明就用这个统一的
@DefaultProperties(defaultFallback = "payment_Global_FailHandler")
public class OrderHystrixController {

    @Resource
    private PaymentHystrixService paymentHystrixService;

    @GetMapping("/consumer/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_OK(id);
        return result;
    }


    @GetMapping("/consumer/payment/hystrix/timeout/{id}")
    //特别指明使用哪一个兜底方法
//    @HystrixCommand(fallbackMethod = "paymentInfo_FailHandler",
//            commandProperties = {
//                    @HystrixProperty(
//                            name = "execution.isolation.thread.timeoutInMilliseconds",
//                            value = "1500")
//    })
    @HystrixCommand //没有具体指明就使用全局的
    public String paymentInfo_TimeOut(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_FAIL(id);
        return result;
    }

    /** 定制服务降级方法*/
    public String paymentInfo_FailHandler(Integer id) {
        return "我是消费者80,系统忙,请稍后再试";
    }

    /**
     * 全局服务降级方法
     * @return
     */
    public String payment_Global_FailHandler() {
        return "全局异常处理信息";
    }
}

It should be noted here that regardless of whether a custom service downgrade method is configured, the annotation @HystrixCommand must be added to the service, otherwise the service downgrade has nothing to do with the service.

For the second question, we can add an implementation class for service downgrade processing to the interface defined by the Feign client to achieve decoupling. Our 80 clients already have the PaymentHystrixService interface. We create a new class PaymentFallbackService to implement this interface. And rewrite the methods in the interface to handle exceptions for the methods in the interface, and we declare the class of the service downgrade method in the PaymentHystrixService @FeignClient annotation:

@Service
//当出现错误是到PaymentFallbackService类中找服务降级方法
@FeignClient(value = "CLOUD-PROVIDER-HYSTRIX-PAYMENT",
              fallback = PaymentFallbackService.class)
public interface PaymentHystrixService {

    @GetMapping("/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id);

    @GetMapping("/payment/hystrix/timeout/{id}")
    public String paymentInfo_FAIL(@PathVariable("id") Integer id);
}
@Service
public class PaymentFallbackService implements PaymentHystrixService{
    @Override
    public String paymentInfo_OK(Integer id) {
        return "PaymentHystrixService paymentInfo_OK出现异常";
    }

    @Override
    public String paymentInfo_FAIL(Integer id) {
        return "PaymentHystrixService paymentInfo_Fail出现异常";
    }
}

Then we cancel all the coupled code in the Controller:

@RestController
@Slf4j
public class OrderHystrixController {

    @Resource
    private PaymentHystrixService paymentHystrixService;

    @GetMapping("/consumer/payment/hystrix/ok/{id}")
    public String paymentInfo_OK(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_OK(id);
        return result;
    }

    @GetMapping("/consumer/payment/hystrix/timeout/{id}")
    public String paymentInfo_TimeOut(@PathVariable("id") Integer id) {
        String result = paymentHystrixService.paymentInfo_FAIL(id);
        return result;
    }
}

Then we closed the 8001 service provider service, simulated server downtime, and accessed the service downgrade method in the PaymentFallbackService class we configured when there was an error in the service access, so that the decoupling of the code was realized and the business logic was no longer confused.

4. Hystrix service circuit break

4.1 Overview of Fuse Mechanism

Service degradation => and then fuse => restore the call link

The fuse mechanism is a microservice link protection mechanism to deal with the avalanche effect. When a microservice of the fan-out link is unavailable or the response time is too long, the service will be degraded and the microservice call of the node will be fuse , That is to say, the service fuse will cause the service to be degraded, and the wrong response information will be returned quickly. When it is detected that the microservice call response of the node is normal, the call link is restored. In other words, the service fuse will re-allow access to the service after the service is completed.

In the SpringCloud framework, the fuse mechanism is implemented by Hystrix. Hystrix will monitor the call status between microservices. When the failed call reaches a certain threshold, the default is 20 call failures within 5 seconds, and the fuse mechanism will be activated. The annotation of the circuit breaker is @HystrixCommand. Regarding the fuse mechanism, you can refer to the paper CircuitBreaker for details.

4.2 Examples

Add the following code to the Service of the 8003 service provider:

/**
 * 服务熔断
 * fallbackMethod                               服务降级方法
 * circuitBreaker.enabled                       是否开启断路器
 * circuitBreaker.requestVolumeThreshold        请求次数
 * circuitBreaker.sleepWindowInMilliseconds     时间窗口期
 * circuitBreaker.errorThresholdPercentage      失败率达到多少后跳闸
 *
 * 以下配置意思是在10秒时间内请求10次,如果有6次是失败的,就触发熔断器
 *
 * 注解@HystrixProperty中的属性在com.netflix.hystrix.HystrixCommandProperties类中查看
 */
@HystrixCommand(fallbackMethod = "paymentCircuitBreaker_fallback", 
       commandProperties = {
             @HystrixProperty(name = "circuitBreaker.enabled", 
                           value = "true"),
             @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", 
                           value = "10"),
             @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds", 
                           value = "10000"),
             @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage",
                           value = "60")
       })
public String paymentCircuitBreaker(@PathVariable("id") Integer id) {
    if (id < 0) {
        throw new RuntimeException("===> id 不能为负数");
    }
    //hutool工具类的使用,等价于UUID.randomUUID().toString()
    String serialNumber = IdUtil.simpleUUID();
    return Thread.currentThread().getName() + " 调用成功,流水号: " + serialNumber;
}

/**
 * 服务熔断触发的服务降级方法
 * @param id
 * @return
 */
public String paymentCircuitBreaker_fallback(@PathVariable("id") Integer id) {
    return "id 不能为负数,请稍后再试。id:" + id;
}

Configure the parameters of the fusing mechanism in the @HystrixCommand annotation. The meaning of the configured parameters is as follows:

Attribute name

meaning

Defaults

circuitBreaker.enabled

Whether to open the circuit breaker

true

circuitBreaker.requestVolumeThreshold

Number of requests

20 times

circuitBreaker.sleepWindowInMilliseconds

Time window

5000ms

circuitBreaker.errorThresholdPercentage

Trip after failure rate

50%

The specific meaning of these property names and their default values ​​can be viewed in the com.netflix.hystrix.HystrixCommandProperties class. And what we configure in the service is to request 10 times within 10 seconds, if 6 times are failed, the fuse will be triggered.

Add the service in the Controller:

@GetMapping("circuit/{id}")
public String paymentCircuitBreaker(@PathVariable("id") Long id) {
    String result = paymentService.paymentCircuitBreaker(id);
    log.info("====> result:" + result);
    return result;
}

4.3 Test

According to our business logic, that is, when our id is an integer, the service can be accessed normally, and when the id is a negative number, an error occurs when accessing the service. We first visit http://localhost:8003/payment/hystrix/circuit/11 to represent the correct service request, and we can find that everything is normal!

Then we made a lot of wrong visits, forcibly triggered the service fuse, and then made the correct visits. We found that after the wrong access exceeded our threshold, the service fuse was triggered. Even if the correct access was made, the service fuse could not be performed. However, after a certain period of time, the correct service access can proceed smoothly. This is the overall process of service fuse: after triggering the service fuse, first downgrade service, and then gradually restored call link .

4.4 Summary

Combined with the description of the fusing mechanism on the official website, the fusing process can be described as follows:

The precise way of opening and closing the fuse is as follows:

  • Assuming that the access on the circuit reaches a certain threshold (HystrixCommandProperties.circuitBreakerRequestVolumeThreshold() )...
  • And suppose the error percentage exceeds the threshold error percentage (HystrixCommandProperties.circuitBreakerErrorThresholdPercentage() )...
  • Then, the circuit breaker changes from CLOSED to OPEN, triggering the fusing mechanism.
  • When it opens, it short-circuits all requests for that circuit breaker.
  • After a period of time (HystrixCommandProperties.circuitBreakerSleepWindowInMilliseconds() ), the next single request is allowed to pass (this is the HALF-OPEN state). If the request fails, the circuit breaker returns OPEN to this state during the sleep window. If the request is successful, the circuit breaker switches to CLOSED, and the logic in the first one takes over again.

That is, in the fuse mechanism, the fuse is divided into three states:

status

Description

Fuse open OPEN

The request is no longer to call the current service. The internally set clock is generally MTTR (Mean Failure Handling Time). When the open time reaches the set clock, it enters the half-fuse state (HALF-OPEN).

Fuse closed CLOSED

The fuse shutdown will not fuse the service.

Fuse half-open HALF-OPEN

Part of the request calls the current service according to the rules. If the request is successful and the rules are met, the current service is considered to be back to normal and the circuit breaker is turned off.

The following is the fuse flow chart on the official website:

So when does the fuse come into play?

Three important parameters related to the fuse: snapshot time window, total number of requests threshold, error percentage threshold

  • Snapshot time window period circuitBreaker.sleepWindowInMilliseconds: Whether the fuse is open or not needs to count some request and error data, and the statistical time range is the quick search time window, the default is the most recent 10 seconds;
  • The threshold of the total number of requests circuitBreaker.requestVolumeThreshold: Within the snapshot time window, the threshold of the total number of requests must be met to be eligible to trigger a circuit breaker. The default is 20 times, which means that within the time specified in the snapshot time window, if the Hystrix command is called less than 20 Second, even if all requests time out or fail for other reasons, the fuse will not open;
  • Error percentage threshold circuitBreaker.errorThresholdPercentage: When the total number of requests exceeds the threshold within the snapshot time window, and among these calls, the fuse will be opened if the percentage of error calls exceeds the error percentage threshold.

After the fuse is opened

When a call is requested after the fuse is opened, the main logic will not be called, but the service degradation method will be directly called, which realizes the automatic detection of errors and switches the degradation logic to the main logic, reducing the effect of response delay.

How to restore the original primary logic of it? => When the fuse opens, after the main logic fuse, Hystrix will start a sleep time window , within this time window, the logic is temporary demotion main logic, when the sleep time window to During the period, the fuse will enter the half-open state and release a request to the original main logic. If the request can be accessed normally, the fuse will enter the closed state to restore the main logic. If the registration request still has a problem, it will blow The device continues to remain open, and the sleep time window is timed again.

All common configurations of Hystrix

Service current limit will be explained in detail in Alibaba's Sentinel framework

5. Hystrix workflow

Official website description (How Hystrix actually downgrades the fuse current limit

The overall Hystrix work flow chart is as follows:

Step description:

6. Hystrix service monitoring Hystrix Dashboard

6.1 Understanding

In addition to isolating dependent service calls, Hystrix also provides quasi-real-time call monitoring-Hystrix Dashboard. Hystrix will continuously record the execution information of all requests initiated through Hystrix and display it to users in the form of statistical reports and graphics, including How many requests are executed per second, how many successes, how many failures, etc. SpringCloud also provides the integration of Hystrix Dashboard, transforming monitoring content into a visual interface.

6.2 Use steps

New Module: cloud-consumer-hystrix-dashboard9001 as Hystrix Dashboard service

Add the dependency of Hystrix Dashboard:

<dependencies>
    <!-- Hystrix Dashboard -->
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
    </dependency>
    <!-- actuator监控 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <!-- 通用配置 -->
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-devtools</artifactId>
        <scope>runtime</scope>
        <optional>true</optional>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <optional>true</optional>
    </dependency>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

Write the configuration file application.yml and add a port:

server:
  port: 9001

Write the main startup class and add the @EnableHystrixDashboard annotation to the main startup class to enable the Hystrix Dashboard function:

@SpringBootApplication
@EnableHystrixDashboard
public class HystrixDashboardMain9001 {
    public static void main(String[] args) {
        SpringApplication.run(HystrixDashboardMain9001.class);
    }
}

Note: All service provider microservices (such as our 8001/8002/8003) need to monitor the dependency configuration:

<!--actuator监控信息完善-->
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

Visit http://localhost:9001/hystrix and we can see the graphical interface of Hystrix Dashboard

Note: The new version of Hystrix needs to specify the monitoring path in the main startup class (otherwise it will report an error: Unable to connect to Command Metric Stream). In order for the service provider's service to be monitored by Hystrix Dashboard, the following configuration needs to be added to the main startup class of the provider service:

/**
 * 此配置是为了服务监控而配置,与服务容错本身无关,springcloud升级后的坑
 * ServletRegistrationBean因为springboot的默认路径不是"/hystrix.stream",
 * 只要在自己的项目里配置上下面的servlet就可以了
 */
@Bean
public ServletRegistrationBean getServlet() {
    HystrixMetricsStreamServlet streamServlet = new HystrixMetricsStreamServlet();
    ServletRegistrationBean registrationBean = new ServletRegistrationBean(streamServlet);
    registrationBean.setLoadOnStartup(1);
    registrationBean.addUrlMappings("/hystrix.stream");
    registrationBean.setName("HystrixMetricsStreamServlet");
    return registrationBean;
}

Enter the service provider to be monitored in the graphical interface of Hystrix Dashboard:

The following is the monitoring status of Dashboard on the service:

How to look?

  • Seven colors
  • There are two meanings for a circle to realize a circle. It represents the health of the instance through the change of color, and its health decreases from green<yellow<orange<red. In addition to the color change of the solid circle, its size will also change according to the request traffic of the instance. The larger the traffic, the larger the solid circle. Therefore, through the display of the solid circle, fault instances and high-pressure instances can be quickly found in a large number of instances.
  • The one-line curve is used to record the relative change of the flow within 2 minutes, and the upward and downward trends of the flow can be observed through it
  • Entire picture description 1

  • Entire picture description 2

  • Some complicated situations

Original link: https://www.cnblogs.com/mpolaris/p/14459451.html?utm_source=tuicool&utm_medium=referral

If you think this article is helpful to you, you can pay attention to my official account and reply to the keyword [Interview] to get a compilation of Java core knowledge points and an interview gift package! There are more technical dry goods articles and related materials to share, let's learn and make progress together!

 

Guess you like

Origin blog.csdn.net/weixin_48182198/article/details/114263306