Hystrix service circuit breaker and service degradation

Source address: https://gitee.com/peachtec/springcloud

Distributed problems

  Applications in a complex distributed architecture have dozens of dependencies, and each dependency will inevitably fail at some point

Service avalanche

  When calling between multiple microservices, suppose that microservice A calls microservice B and microservice C, and microservice B and microservice C call other microservices. This is called "fan-out". The call response time of a microservice on the link is too long or unavailable, the call to microservice A will occupy more and more system resources, which will cause the system to collapse, that is, the "avalanche effect" is
  for high traffic For applications, a single back-end dependency may cause all resources on all servers to be saturated within a few seconds. Worse than failure, these applications may also cause increased delays between services, backup queues, Threads and other system resources are tight, leading to more cascading failures in the entire system. These all indicate the need to isolate and manage failures and delays so that the failure of a single dependency can not cancel the entire application or system.

Service fusing

  The fuse mechanism is a micro-service link protection mechanism corresponding to the avalanche effect. When a micro-service of the
  fan-out link is unavailable or the corresponding time is too long, the service will be degraded, and then the micro-service call of the node will be fuse. Return the response information of the error letter; when it is detected that the microservice call of the node is normal, the call link will be restored; in the SpringCloud framework, the fuse mechanism is implemented by Hystrix. Hystrix will monitor the call between microservices. When the failed call reaches a certain level When the threshold is set, the default is that if 20 calls fail within 5 seconds, the fuse mechanism will be activated. The annotation of the fuse mechanism is @HystrixCommand

What is Hystrix

  Hystrix is ​​an open source library used to deal with the delay and fault tolerance of distributed systems. In distributed systems, many dependencies will inevitably fail to call, such as timeouts, exceptions, etc. Hystrix can guarantee that in the case of a dependency problem , Will not cause the overall service failure, avoid cascading failures to improve the resilience of the distributed system. The
  "fuse" itself is a switching device. When a service unit fails, the fault monitoring of the circuit breaker will inform the caller. Return an alternative response that the service expects and can be handled, instead of waiting for a long time or throwing an unhandled exception of the calling method, so as to ensure that the service caller's thread will not be unnecessarily occupied for a long time, thereby avoiding The spread of faults in distributed systems, and even an avalanche

The role of Hystrix

  1. Provide protection and control the delay and failure of dependencies accessed through third-party client libraries (usually through the network).
  2. Stop cascading failures in complex distributed systems.
  3. Fail fast and recover quickly.
  4. Roll back and downgrade normally if possible.
  5. Enable near real-time monitoring, alarm and operation control.

What problem does Hystrix solve

  Applications in a complex distributed architecture have dozens of dependencies, and each dependency will inevitably fail at some point. If the host application is not isolated from these external faults, it may be eliminated.
  For example, for an application that relies on 30 services, each of which has an uptime of 99.99%, you can expect:

99.99 30 = 99.7% of uptime
0.3% of 1 billion requests = 3,000,000 failures
/ 2+ hours of downtime per month, even if all dependencies have excellent uptime.
  

  The reality is usually worse. Even if you do not design the entire system elastically, and even all dependencies are well executed, even the total impact of a 0.01% pause time on each of dozens of services is equal to the number of hours of possible downtime per month.
  When everything is normal, the request flow is as follows:
Insert picture description here
  When one of the many dependent services fails to send, it may block all user requests:
Insert picture description here
  With high-traffic access, a single back-end service has an error, which may lead to The resource becomes saturated in just a few seconds (it does not necessarily cause resource saturation. For example, if the failed request returns to a timeout, it may cause the number of threads to increase sharply, and if the request cannot be released, it will cause network congestion).
  The application receives the request and accesses each dependent service through the network. At this time, each service may have potential errors. Compared to when a service goes down, what’s more terrifying is that these services may increase the delay between services, which will cause backup queues, threads and other resources to be occupied, resulting in cascading failures of the entire system (the entire service cluster is down. , Even causing the server to be unreachable).
Insert picture description here

Hystrix's design principles

  1. Prevent a single dependent service from exhausting the entire container (for example: tomcat, jboss, etc.) user threads
  2. Reduce load and fail quickly, instead of queuing (breaking between circuits, rather than waiting for recovery).
  3. Provide a backup mechanism where permitted to protect users from failures.
  4. Use isolation techniques (such as isolation, swim lanes, open circuits) to limit the impact of a dependent service failure.
  5. Optimize discovery time
  through near real-time indicators, monitoring and alarms . Propagate configuration changes with low latency in most aspects of Hystrix To optimize the recovery time, and support dynamic attribute changes, which allows you to make real-time operational modifications through a low-latency feedback loop.
  6. Prevent the execution of the entire service cluster from failing, not just the failure of network requests (network congestion).
  

How does Hystrix achieve its design goals?

Hystrix does this in the following ways:
  1. Wrap dependent services (or dependencies) in a HystrixCommand or HystrixObservableCommand object executed by a separate thread for execution (command mode).
  2. The time spent on request timeout exceeds the defined threshold (configured timeout period). Hystrix provides a timeout configuration, which is slightly longer than the actual request time. If these requests time out, they will directly reject the request (this is also the main reason for returning a timeout exception).
  3. Hystrix will maintain a small thread pool (or semaphore); if the semaphore or thread pool is full, the request will be directly rejected instead of waiting or queuing.
  4. Calculate success or failure (exceptions thrown by the Hystrix client), timeouts, and thread rejections.
  5. If the percentage of errors in a service exceeds the threshold, the circuit breaker will be tripped (tripped). A specific request (a request to be disconnected) will be stopped automatically or manually during a period of time when the connection is connected.
  6. Implement a fallback strategy for requests that fail, request rejection, request timeout, or disconnection.
  7. Real-time monitoring and dynamic configuration.
  When you use Hystrix to encapsulate dependent services, the architecture of the situation shown in the figure above will become similar to the situation in the figure below. Each dependency is isolated from each other (isolation by HystrixCommand or HystrixObservableCommand). When a dependency occurs an error, timeout, resource (thread, semaphore) limit, or rejection, the fallback logic will be executed. The fallback logic will respond when the dependency sends any type of failure.
Insert picture description here

Service fusing

  The service fuse is performed on the service provider, so I am going to transform a service provider module. For convenience, I will directly create a service provider module. The code will directly copy the code of other service provider modules, and then add Hystrix related Code

  1. Import Hystrix jar package dependency
<!-- hystrix-->
<dependency>
   <groupId>org.springframework.cloud</groupId>
   <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
  1. Add a fuse note on the control layer interface and specify the method to be fuse
    @GetMapping("/get/{id}")
    @HystrixCommand(fallbackMethod = "hystrixGet")//指定服务熔断执行的方法
    private Dept get(@PathVariable("id") Integer id) {
    
    
        Dept dept = service.queryById(id);
        if (dept == null) {
    
    
            throw new RuntimeException("用户不存在");
        }
        return dept;
    }

    /**
     * 备选方法
     */
    private Dept hystrixGet(@PathVariable("id") Integer id) {
    
    
        return new Dept().setId(id)
                .setName("没用对应信息")
                .setDbSource("mysql中没有这个数据库");
    }
  1. Open service circuit breaker
@SpringBootApplication
@EnableEurekaClient //在服务启动后自动注册到eureka
@EnableDiscoveryClient//服务发现
@EnableCircuitBreaker
public class Dept_Hystrix_8001 {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(Dept_Hystrix_8001.class, args);
    }
}
  1. Start the service for testing, you can find that no matter how you query, it will not directly give an error page as before, and now the wrong query will call the method specified by the service circuit breaker
    Insert picture description here
Service degradation

  The service degradation is performed on the service consumer, and it is more convenient to configure Feign to use, so directly modify the method of Feign service layer interface in the public module

  1. Edit a class in the public module to implement the FallbackFactory class, and recreate the method, and the new method returns the Feign service layer interface class
@Component
public class ClientFallbackFactory implements FallbackFactory {
    
    
    @Override
    public ClientService create(Throwable throwable) {
    
    
        return new ClientService() {
    
    
            @Override
            public boolean addDept(Dept dept) {
    
    
                return false;
            }

            @Override
            public Dept queryById(Integer id) {
    
    
                return new Dept().setId(id).setName("服务已关闭").setDbSource("no database");//测试案例
            }

            @Override
            public List<Dept> queryAll() {
    
    
                return null;
            }
        };
    }
}
  1. Configure the type of Feign's return factory in Feign's service layer interface class
@FeignClient(value = "SPRINGCLOUD-SERVER",fallbackFactory = ClientFallbackFactory.class)
@Component
public interface ClientService {
    
    
    @PostMapping("/dept/add")
    boolean addDept(Dept dept);

    @GetMapping("/dept/get/{id}")
    Dept queryById(@PathVariable("id") Integer id);

    @GetMapping("/dept/get")
    List<Dept> queryAll();
}
  1. Turn on the downgrade service in the yaml configuration file in Feign's consumer module
# 开启降级服务
feign:
  hystrix:
    enabled: true
  1. Start the test
    Insert picture description here
Dashboard flow monitoring

  Hystrix and the client have related operations, you can monitor these requests, and Dashboard is a monitoring page to monitor these information

  1. Create the "springcloud-consumer-hystrix-dashboard" module, this module is the monitoring page of Dashboard, and then import the jar package
<dependencies>
  <!-- dashboard -->
  <dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
  </dependency>
  <!-- hystrix-->
  <dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
  </dependency>
  <!-- Ribbon -->
  <dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-netflix-ribbon</artifactId>
  </dependency>
  <!-- Eureka -->
  <dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
  </dependency>
  <dependency>
      <groupId>org.peach</groupId>
      <artifactId>springcloudentity</artifactId>
      <version>1.0</version>
  </dependency>
  <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-web</artifactId>
  </dependency>
  <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-devtools</artifactId>
  </dependency>
</dependencies>
  1. Configure the yaml file, first have a simple experience, configure the port
server:
  port: 9001
hystrix:
  dashboard:
    proxy-stream-allow-list: "*"
  1. Turn on monitoring on the startup class
@SpringBootApplication
@EnableHystrixDashboard//开启监控
public class DeptConsumerDashboard_9001 {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(DeptConsumerDashboard_9001.class, args);
    }
}
  1. Ensure that each service provider module has a "spring-boot-starter-actuator" service monitoring dependency, and then start the test, visit http://localhost:9001/hystrix in the browser , you can see the default page of Dashboard
    Insert picture description here
  2. Now the Dashboard is empty, there is nothing, then configure it inside, register in the service provider module, and add the " @HystrixCommand " annotation to the interface that needs to be monitored
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>
@SpringBootApplication
@EnableEurekaClient //在服务启动后自动注册到eureka
@EnableDiscoveryClient//服务发现
@EnableCircuitBreaker
public class Dept_Hystrix_8001 {
    
    
    public static void main(String[] args) {
    
    
        SpringApplication.run(Dept_Hystrix_8001.class, args);
    }
    @Bean
    public ServletRegistrationBean hystrixMetricsStreamServlet() {
    
    
        ServletRegistrationBean registrationBean = new ServletRegistrationBean(new HystrixMetricsStreamServlet());
        registrationBean.addUrlMappings("/actuator/hystrix.stream");
        return registrationBean;
    }
}
  1. Start the service to test, the browser visits http://localhost:8001/actuator/hystrix.stream , the actual port is accessed according to your own service, you can see that when the service is started, the access address is always pinging something, similar For heartbeat, when accessing the monitored interface, the page will have some data, which is equivalent to sending a heartbeat packet
    Insert picture description here
  2. Next, visit the address of the Dashboard, go to the page to monitor the flow,
    Insert picture description here
    enter the monitoring page, you can see the status information of the monitored interface, there is currently no access.
    Insert picture description here
    When we access the interface, the monitoring information will also change
    Insert picture description here
    . For more information about the chat interface
      circle
      filled circles: There are two meanings, by the health of the representative examples of the color change, green <yellow <orange <red diminishing; the flow rate would change by requesting examples of its size, The larger the flow, the larger the solid circle, so through the solid circle representation, you can quickly find fault cases and high-voltage cases in a large number of instances.
    Insert picture description here
      First-line
      curve: record the relative change of the flow within 2 minutes, you can use it to observe the flow Up and down trend
    Insert picture description here
      other
    Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_45481406/article/details/110404439