Fault-tolerant distributed systems Spring Cloud-- delay circuit breaker assembly Hystrix

Part I: the Spring Cloud - Ribbon load balancing and remote call Feign principles and examples

Introduction 1. hystrix

Hystrix is a distributed system for processing delays and fault-tolerant open source libraries, distributed systems, many rely inevitably call fails, such as overtime, abnormal, Hystrix can guarantee in the case of a dependency problem, It will not result in an overall service to fail, to avoid cascading failures, in order to improve the flexibility of distributed systems.
"Breaker" is a switching apparatus itself, when a service unit fails, the fault monitoring circuit breaker (similar to a blown fuse), and returns to the caller a line with expectations, the response may be processed alternatively (FallBack) instead of waiting for a long time or throw an exception caller can not handle, thus ensuring the caller to the service thread it will not be long, unnecessarily occupied, so as to avoid the spread of the fault in the distributed system, and even avalanche.

2. avalanche effect

Distributed system environment, similar dependencies between services are very common, usually rely on a business call more basic services. As shown, for synchronous calls, when the inventory service is not available, merchandise service request thread is blocked, when there are large quantities of inventory service call request, it may eventually lead to the entire merchandise service resources are exhausted, unable to continue to provide services. And such use may not be passed up along the call chain, this phenomenon is called avalanche effect requested. Source: https: //my.oschina.net/7001/blog/1619842

3. hystrix implementation principle

For micro-service issues, Hystrix solutions include service limit, timeout monitoring, service fuse, service degradation.

Isolation (thread pool semaphore isolation and quarantine): limit the use of resources distributed service call, a call to a service problem does not affect other service calls.
1) thread pool isolation mode: using a thread pool to store the current request, a request for handling the thread pool, timeout setting processing task returns, packed bulk into the request queue thread pool. This approach requires for each dependent service request thread pool, there is a certain resource consumption, the benefits can cope with bursty traffic (traffic peak comes, can not finish processing the data stored in the thread pool team slowly process)
2 ) semaphore isolation mode: using a counter atom (or semaphores) to record the current number of threads running, first request judgment value of the counter exceeds the maximum number of threads provided a new type change request is discarded if no more than the request counter performs a counting operation +1, -1 counter request returned. This approach is strictly controlled thread and returns immediately mode, unable to cope with bursty traffic (traffic peak comes, threading exceeds the number of other requests will be returned directly, do not continue to rely on the requested service).
Melt-off: When the failure rate reaches the threshold automatically triggers downgrade (failure rate as a result of network failure / timeout caused), fuse failure will trigger rapid rapid recovery. Under normal conditions, the circuit is in a closed state (Closed), if the call duration error, or timed out, the circuit is opened into the blown state (the Open), all calls in a subsequent period is rejected (Fail Fast), some time later, the protector will try to enter the semi-blown state (half-Open), allowing a small amount request comes in to try, if the call fails, the return to fuse state, if the call is successful, the return to the closed circuit state.
Cache: Cache provides request, request the merger to achieve.
Support for real-time monitoring, alarm and control (configuration changes)

4. Spring Cloud Hystrix entry

Hystrix use the callback mechanism fails, call in a window period metrics.rollingStats.timeInMilliseconds (default: 10 seconds), a call to a service more than circuitBreaker.requestVolumeThreshold (default: 20 requests)once and fail percentages than circuitBreaker.errorThresholdPercentage (default: >50%)open short circuit performs FallBack method provided by the developer, to avoid the entire service stuck here.
Here Insert Picture Description

4.1 introduces Hystrix

pom file import dependence

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix</artifactId>
</dependency>

Start adding annotations @EnableHystrix class or @EnableCircuitBreaker (among them is an inheritance, two annotation content described in exactly the same)

@SpringBootApplication
@EnableHystrix 
public class Application {

    public static void main(String[] args) {
        new SpringApplication.run(Application.class,args);
    }

}

Then the controller above add comments @HystrixCommand (fallbackMethod is the name we just said method)

@RestController
public class TestController{

    @HystrixCommand（fallbackMethod =“ defaultUser”）
    @GetMapping("/user")
    public Object getUser（Map <String，Object> params）{
        //调用其他微服务，可能会失败
    }
	/ **
	*	失败的时候返回一些默认信息或者其他有用的东西一些有用的
	* /
    public Object defaultUser（Map <String，Object>params）{
        默认值;
    }
}

@HystrixCommand annotations can be added to the configuration, the commandProperties property is used together with @HystrixProperty list of comments. For more detailed information, see the official Wiki .

4.2 propagate the security context or use Spring Scope

If you want to spread some threads in the local context @HystrixCommand. The default statement does not work because it executes a command (if time-out) in the thread pool. You can use a different configuration makes the comment "isolation policy" will Hystrix switch to using the same thread with the caller or used directly in the comments. The following example shows how to set up a thread in the comments:

HystrixCommand(fallbackMethod = "stubMyService",
    commandProperties = {
      @HystrixProperty(name="execution.isolation.strategy", value="SEMAPHORE")
    }
)

When @SessionScope or @RequestScope, if you encounter runtime exceptions, suggesting that it can not find the context within range, you need to use the same thread. You can also choose to hystrix.shareSecurityContext property set to true. Doing so will automatically configure a plug-Hystrix concurrency strategy linked to its SecurityContext transferred from the main thread to thread Hystrix command. Hystrix Hystrix not allowed to register multiple concurrent policy, you can declare themselves by HystrixConcurrencyStrategy as a Spring Bean to use.

4.3 Health Indicators

Connection state of the breaker shown in the following example:

{
    "hystrix": {
        "openCircuitBreakers": [
            "StoreIntegration::getStoresByLocationLink"
        ],
        "status": "CIRCUIT_OPEN"
    },
    "status": "UP"
}

4.4 Hystrix flow monitoring

In order to monitor the use of Hystrix stream may be introduced spring-boot-starter-actuator and configured dependent parameters management.endpoints.web.exposure.include: hystrix.stream. Access to /actuator/hystrix.streamasview the cluster monitor status.

<dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>

4.5 Breaker: Hystrix Dashboard

One of the main advantages of Hystrix is a set of metrics about each HystrixCommand it collects. Hystrix dashboard displays the health of each circuit breaker in an efficient manner. Specifically refer to the official website: https://cloud.spring.io/spring-cloud-static/spring-cloud-netflix/2.2.1.RELEASE/reference/html/#netflix-hystrix-dashboard-starter

pom file import dependence

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-hystrix-dashboard</artifactId>
</dependency>

Start adding annotation class @EnableHystrixDashboard

@SpringBootApplication
@EnableHystrix
@EnableHystrixDashboard
public class Application {

    public static void main(String[] args) {
        new SpringApplication.run(Application.class,args);
    }

}

Access hystrix you can see the monitor interface /hystrix

dashboard explanation:
solid circles: There are two meanings. It represents a change in color by the health of the instance, its health degree from green to red (bad to good health degree); changes in its size can also occur at the request of traffic instance, the greater the flow, the greater the solid circle . So by showing the solid circles, you can quickly find plenty of examples of the failed instance and instance of high pressure.
Curve: 2 minutes for the relative change in the flow rate recording to the observed increase in the flow through and decreased it.

4.6 Feign integration Hystrix

Feign default support Hystrix, but in Spring - after cloud Dalston version is off by default, because the business needs to not have to use, so now you want to use must first turn on him, yml file with the following configuration:

feign:
  hystrix:
    enabled: true

Then add the fallback configuration properties in FeignClient annotation:

@FeignClient(value = "SERVER-ORDER",fallback = OrderServiceFallBack.class)
public interface OrderServiceClient {

    @RequestMapping("/order")
    public Order getOrder(@RequestParam("ud") String id);

}

@Component
public class OrderServiceFallBack implements OrderServiceClient {
    @Override
    public Object getOrder(String id) {
        return Result.error("测试降级");
    }
}

If you need to get the specific error message, you can write like this:

@Component
public class OrderServiceClientFallBackFactory implements FallbackFactory<OrderServiceClient> {
    @Override
    public OrderServiceClient create(Throwable throwable) {
        return new OrderServiceClient() {
            @Override
		    public Object getOrder(String id) {
		    	String message = throwable.getMessage();
		    	//记录错误信息或者其他业务逻辑
		        return Result.error("测试降级");
		    }
        };
    }
}

The client specifies a fallbackFactory just fine

@FeignClient(value = "SERVER-ORDER",fallbackFactory = OrderServiceClientFallBackFactory .class)
public interface OrderServiceClient {

    @RequestMapping("/order")
    public Order getOrder(@RequestParam("ud") String id);

}

At this point, we completed the integration of hystrix and feign. View Source: springcloud-Demo lessons in micro service code (service-provider-course).

5. Important configuration instructions

Execution相关的属性的配置
hystrix.command.default.execution.isolation.strategy 隔离策略，默认是Thread, 可选Thread｜ Semaphor

hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds 命令执行超时时 间，默认1000ms

hystrix.command.default.execution.timeout.enabled 执行是否启用超时，默认启用true

hystrix.command.default.execution.isolation.thread.interruptOnTimeout 发生超时是是否中断， 默认true

hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests 最大并发请求 数，默认10，该参数当使用ExecutionIsolationStrategy.SEMAPHORE策略时才有效。如果达到最大并发请求 数，请求会被拒绝。理论上选择semaphore size的原则和选择thread size一致，但选用semaphore时每次执行 的单元要比较小且执行速度快（ms级别），否则的话应该用thread。 semaphore应该占整个容器（tomcat）的线程池的一小部分。 Fallback相关的属性 这些参数可以应用于Hystrix的THREAD和SEMAPHORE策略

hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests 如果并发数达到 该设置值，请求会被拒绝和抛出异常并且fallback不会被调用。默认10

hystrix.command.default.fallback.enabled 当执行失败或者请求被拒绝，是否会尝试调用

hystrixCommand.getFallback() 。默认true

Circuit Breaker相关的属性 
hystrix.command.default.circuitBreaker.enabled 用来跟踪circuit的健康性，如果未达标则让request短路。默认true

hystrix.command.default.circuitBreaker.requestVolumeThreshold 一个rolling window内最小的请 求数。如果设为20，那么当一个rolling window的时间内（比如说1个rolling window是10秒）收到19个请求， 即使19个请求都失败，也不会触发circuit break。默认20

hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds 触发短路的时间值，当该值设 为5000时，则当触发circuit break后的5000毫秒内都会拒绝request，也就是5000毫秒后才会关闭circuit。 默认5000

hystrix.command.default.circuitBreaker.errorThresholdPercentage错误比率阀值，如果错误率>=该 值，circuit会被打开，并短路所有请求触发fallback。默认50

hystrix.command.default.circuitBreaker.forceOpen 强制打开熔断器，如果打开这个开关，那么拒绝所 有request，默认false

hystrix.command.default.circuitBreaker.forceClosed 强制关闭熔断器 如果这个开关打开，circuit将 一直关闭且忽略circuitBreaker.errorThresholdPercentage

Metrics相关参数

hystrix.command.default.metrics.rollingStats.timeInMilliseconds 设置统计的时间窗口值的，毫秒 值，circuit break 的打开会根据1个rolling window的统计来计算。若rolling window被设为10000毫秒， 则rolling window会被分成n个buckets，每个bucket包含success，failure，timeout，rejection的次数 的统计信息。默认10000

hystrix.command.default.metrics.rollingStats.numBuckets 设置一个rolling window被划分的数 量，若numBuckets＝10，rolling window＝10000，那么一个bucket的时间即1秒。必须符合rolling window  % numberBuckets == 0。默认10

hystrix.command.default.metrics.rollingPercentile.enabled 执行时是否enable指标的计算和跟踪， 默认true

hystrix.command.default.metrics.rollingPercentile.timeInMilliseconds 设置rolling  percentile window的时间，默认60000

hystrix.command.default.metrics.rollingPercentile.numBuckets 设置rolling percentile  window的numberBuckets。逻辑同上。默认6

hystrix.command.default.metrics.rollingPercentile.bucketSize 如果bucket size＝100，window ＝10s，若这10s里有500次执行，只有最后100次执行会被统计到bucket里去。增加该值会增加内存开销以及排序 的开销。默认100

hystrix.command.default.metrics.healthSnapshot.intervalInMilliseconds 记录health 快照（用 来统计成功和错误绿）的间隔，默认500ms


Request Context 相关参数

hystrix.command.default.requestCache.enabled 默认true，需要重载getCacheKey()，返回null时不 缓存

 hystrix.command.default.requestLog.enabled 记录日志到HystrixRequestLog，默认true
 
 Collapser Properties 相关参数
 
 hystrix.collapser.default.maxRequestsInBatch 单次批处理的最大请求数，达到该数量触发批处理，默认 Integer.MAX_VALU
 
 hystrix.collapser.default.timerDelayInMilliseconds 触发批处理的延迟，也可以为创建批处理的时间 ＋该值，默认10
 
 hystrix.collapser.default.requestCache.enabled 是否对HystrixCollapser.execute() and  HystrixCollapser.queue()的cache，默认true
 
 ThreadPool 相关参数
 
 线程数默认值10适用于大部分情况（有时可以设置得更小），如果需要设置得更大，那有个基本得公式可以 follow： requests per second at peak when healthy × 99th percentile latency in seconds + some  breathing room 每秒最大支撑的请求数 (99%平均响应时间 + 缓存值) 比如：每秒能处理1000个请求，99%的请求响应时间是60ms，那么公式是： 1000 （0.060+0.012）
 
 基本得原则时保持线程池尽可能小，他主要是为了释放压力，防止资源被阻塞。 当一切都是正常的时候，线程池一般仅会有1到2个线程激活来提供服务
 
 hystrix.threadpool.default.coreSize 并发执行的最大线程数，默认10
 
 hystrix.threadpool.default.maxQueueSize BlockingQueue的最大队列数，当设为－1，会使用
 
 SynchronousQueue，值为正时使用LinkedBlcokingQueue。该设置只会在初始化时有效，之后不能修改threadpool的queue size，除非reinitialising thread executor。默认－1。
 
 hystrix.threadpool.default.queueSizeRejectionThreshold 即使maxQueueSize没有达到，达到 queueSizeRejectionThreshold该值后，请求也会被拒绝。因为maxQueueSize不能被动态修改，这个参数将允 许我们动态设置该值。if maxQueueSize == 1，该字段将不起作用 hystrix.threadpool.default.keepAliveTimeMinutes 如果corePoolSize和maxPoolSize设成一样（默认 实现）该设置无效。如果通过plugin（https://github.com/Netflix/Hystrix/wiki/Plugins）使用自定义 实现，该设置才有用，默认1.
 hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds 线程池统计指标的时间，默 认10000
 
 hystrix.threadpool.default.metrics.rollingStats.numBuckets 将rolling window划分为n个 buckets，默认10

More detailed and complete configuration, see Hystrix official website: https://github.com/Netflix/Hystrix/wiki/Configuration
a case: the Spring Cloud - Configuration Center Config