Hystrix limiting distributed system, downgrade, fusing the frame (b)

Three, Hystrix fault tolerance

       Hystrix mainly by adding fault tolerance permitted delay and fault-tolerant methods to help control the interaction between these distributed services. Through access point service between isolation and prevent cascading failures between them and provide fallback options to achieve this, thereby improving the overall resilience of the system. Hystrix mainly provides the following fault-tolerant method:

  • Resource isolation
  • Fuse
  • Demote

1, resource isolation - the thread pool


cmd-markdown-logo

cmd-markdown-logo



2, resource isolation - semaphore


cmd-markdown-logo



cmd-markdown-logo

Thread pool and semaphore isolation compare
  Thread switching Supports asynchronous Timeout support Support fuse Limiting Spending
signal no no no Yes Yes small
Thread Pool Yes Yes Yes Yes Yes Big

       When the service request network overhead is relatively large, time-consuming or requests, our best is to use thread isolation strategy, so you can guarantee a large amount of container (tomcat) thread is available, not because the service reasons, has been in blocking or wait state, failed to return quickly. And when we request caching these services, we can use a semaphore isolation strategy, because such services are usually returned very fast, will not take too long container thread, but also reduced the number of thread switching overhead, improve the efficiency of the cache service.

       Thread pool for the vast majority of scenarios, calls to the network of the service request dependent; semaphore for access to external access is not dependent, but access to the inside of some of the more complex business logic, just do ordinary semaphore limit flow on it.

3, fuse

Why use a circuit breaker ?

       In a distributed architecture, an application relies more services are very common, if one relies too high thread blocked due to delays occur, call the dependent services will be blocked if the higher related business QPS, it is possible to produce a large number obstruction, resulting in the application / service because the server resources are exhausted and worn down. In addition, the failure will be passed between applications, are more dependent on upstream service if a fault may cause the avalanche effect service.

cmd-markdown-logo

Detailed process fuses the work as follows:

The first step, calling allowRequest () to determine whether to allow the request to be submitted to the thread pool

1、如果熔断器强制打开,circuitBreaker.forceOpen为true,不允许放行,返回。

2、如果熔断器强制关闭,circuitBreaker.forceClosed为true,允许放行。此外不必关注熔断器实际状态,也就是说熔断器仍然会维护统计数据和开关状态,只是不生效而已。

第二步,调用isOpen()判断熔断器开关是否打开

1、如果熔断器开关打开,进入第三步,否则继续;

2、如果一个周期内总的请求数小于circuitBreaker.requestVolumeThreshold的值,允许请求放行,否则继续;

3、如果一个周期内错误率小于circuitBreaker.errorThresholdPercentage的值,允许请求放行。否则,打开熔断器开关,进入第三步。

第三步,调用allowSingleTest()判断是否允许单个请求通行,检查依赖服务是否恢复

1、如果熔断器打开,且距离熔断器打开的时间或上一次试探请求放行的时间超过circuitBreaker.sleepWindowInMilliseconds的值时,熔断器器进入半开状态,允许放行一个试探请求;否则,不允许放行。

此外,为了提供决策依据,每个熔断器默认维护了10个bucket,每秒一个bucket,当新的bucket被创建时,最旧的bucket会被抛弃。其中每个blucket维护了请求成功、失败、超时、拒绝的计数器,Hystrix负责收集并统计这些计数器。

cmd-markdown-logo

执行策略如下:

Hystrix遇到一个超时/失败请求,此时启动一个10s的窗口,后续的请求会进行如下判断:

(1)查看失败次数是否超过最小调用次数

  • 如果没有超过,则放行请求。
  • 如果超过最小请求数,继续下面逻辑

(2)判断失败率是否超过一个阈值,这里错误是指超时和失败两种。

  • 如果没有超过,则放行
  • 如果超过错误阈值,则继续下面逻辑

(3)熔断器断开

  • 请求会直接返回失败。
  • 会开一个5s的窗口,每隔5s调用一次请求,如果成功,表示下游服务恢复,否则继续保持断路器断开状态。

4、降级

       降级,通常指务高峰期,为了保证核心服务正常运行,需要停掉一些不太重要的业务,或者某些服务不可用时,执行备用逻辑从故障服务中快速失败或快速返回,以保障主体业务不受影响。

       要支持回退或降级处理,一般是查询操作,可以重写HystrixCommand的getFallBack方法或HystrixObservableCommand的resumeWithFallback方法,通常不建议在回退逻辑中执行任何可能失败的操作。

Hystrix在以下几种情况下会走降级逻辑:

  • 执行construct()或run()抛出异常
  • 熔断器打开导致命令短路
  • 命令的线程池和队列或信号量的容量超额,命令被拒绝
  • 命令执行超时

       如果降级逻辑中需要发起远程调用,建议重新封装一个HystrixCommand,使用不同的ThreadPoolKey,与主线程池进行隔离。

四、Hystrix配置

       Hystrix默认使用Netflix Archaius进行配置管理,项目中使用zookeeper作为配置源,通过archaius-zookeeper实现hystrix命令、熔断器、线程池、监控等参数的动态配置,根据生产环境需要动态调整hystrix参数,实现了对微服务的治理。

       每个Hystrix参数都有4个地方可以配置,优先级从低到高如下,如果每个地方都配置相同的属性,则优先级高的值会覆盖优先级低的值:

  • 内置全局默认值:写死在Hystrix代码里的默认值,如HystrixCommandProperties.default_executionTimeoutInMilliseconds属性
  • 动态全局默认属性:全局配置文件读到的默认值
  • 内置实例默认值:创建HystrixCommand时,通过注解或者给父类构造器传参的方式设置的默认值
  • Examples of dynamic configuration properties: Specific examples of the configuration files by attribute values

The timeout hystrix sns.grassSearchIndex command execution is provided as an example, the priority attributes from low to high

default_executionTimeoutInMilliseconds=1000
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=500
HystrixCommandProperties.Setter().withExecutionTimeoutInMilliseconds(1000)
hystrix.command.sns.grassSearchIndex.execution.isolation.thread.timeoutInMilliseconds=1500

Command Configuration

# 隔离策略: 可选THREAD|SEMAPHORE,默认TREAD
hystrix.command.default.execution.isolation.strategy=TREAD
# 服务超时时间,单位毫秒,默认1000
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=3000 
# 服务sns.grassSearchIndex超时时间
hystrix.command.sns.grassSearchIndex.execution.isolation.thread.timeoutInMilliseconds=2000
# 是否打开超时检测,默认启用true
hystrix.command.default.execution.timeout.enabled=true
# 使用信号量隔离时qps阈值,后续的请求会被拒,默认10
hystrix.command.default.execution.isolation.semaphore.maxConcurrentRequests=10

Fuse (Circuit Breaker) Configuration

# 是否打开断路器,默认开启true
hystrix.command.default.circuitBreaker.enabled=true
# 断路器检测的基础请求值,只有时间窗口内的请求数达到这个阈值时,才会判定错误率,否则比如只有一两个请求,即便都失败了,也不会打开断路器,因为基数太少了,默认20
hystrix.command.default.circuitBreaker.requestVolumeThreshold=100
# 错误百分比,超过就会短路,默认值50
hystrix.command.default.circuitBreaker.errorThresholdPercentage=75
# 指的是从断路器打开状态到半开半闭状态需要的时间,即断路后,需要等多久才能放一个请求进来,默认值5000
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds=5000

# Metrics
# 统计的时间窗口大小,默认10000
hystrix.command.default.metrics.rollingStats.timeInMilliseconds=10000
# 时间窗口的桶的数目,必须能被时间窗口大小整除,否则报错,每个bucket包含success,failure,timeout,rejection的次数的统计信息,默认10
hystrix.command.default.metrics.rollingStats.numBuckets=10
# 每一次检测的间隙。因为就算分窗口统计错误率,也会很占cpu,所以每一次统计都会等一个时间间隔再开始,默认500
hystrix.command.default.metrics.healthSnapshot.intervalInMilliseconds=500
# 带rollingPercentile的都表示调用时延的统计,该选项表示是否打开时延统计,比如说95分位99分位等,如果关闭都返回-1,默认true
hystrix.command.default.metrics.rollingPercentile.enabled=true
# 时延统计的时间窗口,默认60000
hystrix.command.default.metrics.rollingPercentile.timeInMilliseconds=60000 
# 时延统计的桶数目
hystrix.command.default.metrics.rollingPercentile.numBuckets=6
# 时延统计的桶大小,时延统计每一个桶只维持最新的该数值的请求的数据,早一些的将会被覆盖。如果bucket size=100,window=10s,若这10s里有500次执行,只有最后100次执行会被统计到bucket里去,增加该值会增加内存开销以及排序的开销,默认100
hystrix.command.default.metrics.rollingPercentile.bucketSize=100 

Thread pool (ThreadPool) Configuration

# 默认核心线程数,不会变,默认10
hystrix.threadpool.default.coreSize=30
# userLogin隔离线程池核心线程数
hystrix.threadpool.userLogin.coreSize=20
# 等待队列,还超就会被拒,这个数值无法动态修改,默认-1
hystrix.threadpool.default.maxQueueSize=50000
# userLogin隔离线程池等待队列
hystrix.threadpool.userLogin.maxQueueSize=3000
# 进入queue时被拒的概率值,即便是没有达到maxQueueSize。这个为了弥补上面无法动态修改的不足。可以通过这个概率值来控制队列大小
hystrix.threadpool.default.queueSizeRejectionThreshold=45000
#线程池统计指标的时间,默认10000
hystrix.threadpool.default.metrics.rollingStats.timeInMilliseconds=10000
#将rolling window划分为n个buckets,默认10
hystrix.threadpool.default.metrics.rollingStats.numBuckets=10

Fifth, service monitoring

Hystrix Dashboard

       Hystrix Dashboard is mainly used for real-time monitoring of the indicators of information Hystrix. Real-time information through Hystrix Dashboard feedback can help us quickly identify system problems.

       By https://search.maven.org download standalone-hystrix-dashboard site, visit http jar run after the package: // localhost: 7979 / hystrix- dashboard /, you can enter hystrix dashboard page

nohup java -jar -DserverPort=7979 -DbindAddress=localhost standalone-hystrix-dashboard-1.5.3-all.jar &

cmd-markdown-logo

       Clustered environment monitoring can be used to monitor turbine Netflix offer. By maven public service https://search.maven.org download and deploy war package turbine-web, to modify the cluster node configuration, the turbine address http: // localhost: $ {port} /turbine.stream?cluster=default add monitoring to dashboard

cmd-markdown-logo

turbine.aggregator.clusterConfig=default
turbine.instanceUrlSuffix=:8080/gateway/hystrix.stream
turbine.ConfigPropertyBasedDiscovery.test.instances=10.66.70.1,10.66.70.2,10.66.70.3

Guess you like

Origin www.cnblogs.com/qingfengEthan/p/12113115.html