Service availability -Hystrix

The introduction of Hystrix

Distributed micro-service scenario, service call service calls dependence, is called by RPC. At high concurrent access, depending on whether or not the stability of the service call to service impact is very large, that is dependent services there are many uncontrollable issues, such as: network delay, services busy, service blocking competition, and other services are not available.
User-request service depends on the service I appear unavailable, but other dependencies are still available. When I rely on the service blocked, the server thread pool most appeared blocked (BLOCK), affect the stability of the entire line services.

Here Insert Picture DescriptionHere Insert Picture Description

1, Hystrix Profile

Hystrix: By fusing Service (also referred to as a trip), degraded, limiting (isolation), and other means to control the asynchronous RPC services dependent delay failure.

Circuit Breaker: Fuse

Fuse just call this side effect in the service, simply change the consumer end.
1) fuse switch logic conversion
:. A request to the health services = number of failures / total number of requests.
:. B fuse switching from off to an open state by current health services and the set threshold value is determined
::. B1 is closed, if the request is allowed by current health of the fuse higher than the set threshold value, the switch will remain off if the current health below the set threshold value, the switch is switched to the open state.
::... B2 open state after a period of time, the fuse will automatically enter the half-open state, then through a fuse allows only one request when the request call succeeds, the fuse to return to the closed state if the request fails, fuse remains open, the subsequent request is prohibited by
:. C ensure that service the caller when calling the exception service, fast return results, avoid a lot of synchronization wait
:. D over time to detect a request to continue the implementation of the results, may provide recovery service call
2) setting the value of the parameter
: A.circuitBreaker.requestVolumeThreshold // sliding window size, default is 20
: B.circuitBreaker.sleepWindowInMilliseconds // too long, the fuse is detected again whether to open, the default is 5000, that is, 5s bell
: C.circuitBreaker.errorThresholdPercentage // error rate of 50% by default whenever the 20 requests, 50% have failed, the fuse will open, this time calling this service, will return to direct failure, no longer remote transfer service. 5s until after the bell, re-testing of the trigger conditions, to determine whether the fuse is closed, or continue to open.

Downgrade: fallback

When a service is blown, the server will no longer be called, this time a client can prepare their own local fallback callback returns a default value

Isolation: limiting (isolated)

Thread isolation:
the upcoming dependent services assigned to each separate thread pool resource isolation, so as to avoid an avalanche service
line thread pool is recommended not to set too large, otherwise there is a large number of stuck threads might slow down the server
signal isolation:
for limiting concurrent access, diffusion preventing clogging, the biggest difference is that the isolation thread execution thread is still dependent on the code request thread (the thread need to apply signals)
if the client is authentic and may quickly return, the thread may be used alternatively isolation signal isolation, reduce overhead.

2, Hystrix execution flow

Here Insert Picture Description
①, each call to create a new HystrixCommand, encapsulated in the dependent call run () method.
②, execute execute () / queue do synchronous or asynchronous calls.
③, determining the fuse (circuit-breaker) is open, skip to step 8 if open, downgrade policy, if the process proceeds to step off.
④, the thread pool is determined / queue / run over whether the semaphore, if run over in degraded Step 8, otherwise continue with the subsequent steps.
⑤, call HystrixCommand run method. Run dependent logic
: 5a: call timed dependent logic proceeds to step 8.
⑥: logic to determine whether the call was successful
: 6a: returns a successful result of calling
:: 6b: Error call, proceeds to step 8.
⑦: calculating fuse state, all of the operating state (success, failure, rejection, timeout) reported to the fuse, in order to determine the statistics for the fuse state.
⑧: getFallback () logic degraded.
: The following four cases will trigger getFallback call:
::: (1): run () method throws an exception of non-HystrixBadRequestException.
::: (2): run () method call timeout
::: (3): Fuse open intercept calls
::: (4): the thread pool / queue / run over whether the semaphore
: 8a: not implemented getFallback of Command will direct throw an exception
: 8b: fallback logic call successfully returned directly to downgrade
: 8c: downgraded logical call fails throwing an exception
⑨: return the result of the successful implementation

3, Hystrix parameters introduced

Timeout (default 1000ms, Unit: MS)
(. 1) hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds
the caller are arranged to be timeout in all methods of the caller is the value of a lower priority than below the specified configuration
(2) hystrix.command.HystrixCommandKey.execution.isolation.thread.timeoutInMilliseconds
calling side configuration, the method is designated the caller (HystrixCommandKey method name) is the timeout value

The core number of threads in the thread pool
hystrix.threadpool.default.coreSize (default is 10)

Queue
(. 1) hystrix.threadpool.default.maxQueueSize (maximum queue length default -1 SynchronousQueue use. A LinkedBlockingQueue use other values. If replaced by other values from -1 to restart is required, i.e. this value can not be dynamically adjusted, if to dynamically adjust, need to use this configuration below)
(2) hystrix.threadpool.default.queueSizeRejectionThreshold (queuing threshold number of threads, the default is 5, refused to reach, if this option is configured, the size of the queue is a queue)
Note: If maxQueueSize = -1, then the option failed

Circuit breaker
(1) hystrix.command.default.circuitBreaker.requestVolumeThreshold (When this number is reached failures are configurable window of time, short-circuit. The default 20)
the For Example, IF 20 is The value IS, the then IF Requests are only. 19 received in the rolling window (say a window of 10 seconds) the circuit will not trip open even if all 19 failed.
Briefly, the number of failed requests 10s reaches 20, circuit breaker.
(2) hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds (start trying to short-circuit long after whether to restore default 5S)
(3) hystrix.command.default.circuitBreaker.errorThresholdPercentage (percentage error threshold, when this threshold is reached, the beginning of a short circuit. The default 50%)

fallback
hystrix.command.default.fallback.isolation.semaphore.maxConcurrentRequests
maximum number (the calling thread is allowed to request HystrixCommand.GetFallback (), the default will exceed an exception is thrown 10. Note: The configuration for the isolation mode also THREAD kick in)

Integration Hystrix

1, the technical framework

Project framework: Spring boot
Distributed Coordination: Dubbo
Middleware: Zookeeper
logging tools: Sf4j
build tools: Maven
development tools: IDEA

2, prepare the environment

2.1 Add dependence

  <!--hystrix -->
  <dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>${hystrix-version}</version>
  </dependency>
  <dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-metrics-event-stream</artifactId>
    <version>${hystrix-version}</version>
  </dependency>
  <dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-javanica</artifactId>
    <version>${hystrix-version}</version>
  </dependency>
  <dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-servo-metrics-publisher</artifactId>
    <version>${hystrix-version}</version>
  </dependency>

hystrix-version: 1.5.18, add the required version.

2.2 Hystrix arrangement

Hystrix configure the service caller
Here Insert Picture Description
@Configuration
public class HystrixConfig {

//设置Aspect
@Bean
public HystrixCommandAspect hystrixCommandAspect(){
    return new HystrixCommandAspect();
}

//注入servlet
@Bean
public ServletRegistrationBean hystrixMetricsStreamServlet(){
    return new ServletRegistrationBean(new HystrixMetricsStreamServlet(),"/hystrix.stream");
}

}
HystrixCommandAspect () configured to hystrixCommandAspect Spring objects, hystrixMetricsStreamServlet () is arranged to monitor hystrix-dashboard.

2.3 Hystrix code implementation

The method relies on a service call plus @HystrixCommand comment, and configuration parameters.

@Service
public class ComsumerServiceImpl implements ConsumerService {
private static final Logger LOGGER = LoggerFactory.getLogger(ComsumerServiceImpl.class);

@Autowired
private OrderService orderService;

private AtomicInteger successCount = new AtomicInteger(0);
private AtomicInteger failCount = new AtomicInteger(0);


@Override
@HystrixCommand(groupKey="BestGroup", commandKey = "BestCommand",threadPoolKey = "best", fallbackMethod = "getNameFallback",
        commandProperties = {
                //@HystrixProperty(name = "execution.isolation.strategy", value = "SEMAPHORE"),//指定隔离策略为信号量SEMAPHORE,默认THREAD
                @HystrixProperty(name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000"),//指定多久超时,单位毫秒。超时进fallback
                @HystrixProperty(name = "circuitBreaker.requestVolumeThreshold", value = "10"),//一个统计窗口内熔断触发的最小个数/10s,默认是20
                @HystrixProperty(name = "circuitBreaker.sleepWindowInMilliseconds", value = "5000"),//熔断多少秒后去尝试请求,默认是5000ms
                @HystrixProperty(name = "circuitBreaker.errorThresholdPercentage", value = "10"),//失败率达到多少百分比后熔断,默认值50
                @HystrixProperty(name = "fallback.isolation.semaphore.maxConcurrentRequests", value = "50")//fallback最大并发度,默认10
        },
        threadPoolProperties = {
                @HystrixProperty(name = "coreSize", value = "10"),// 设置线程池的core size,这是最大的并发执行数量,默认是10
                @HystrixProperty(name = "maxQueueSize", value = "5"),// 最大队列长度。设置BlockingQueue的最大长度,默认是-1
                @HystrixProperty(name = "keepAliveTimeMinutes", value = "2"),
                @HystrixProperty(name = "queueSizeRejectionThreshold", value = "3")// 此属性设置队列大小拒绝阈值 - 即使未达到maxQueueSize也将发生拒绝的人为最大队列大小。
                // 此属性存在,因为BlockingQueue的maxQueueSize不能动态更改,我们希望允许您动态更改影响拒绝的队列大小
                // 默认值:5, 注意:如果maxQueueSize == -1,则此属性不适用
        })
public String getName(String id) {
    LOGGER.info("consumer->service->thread:{},id:{}", Thread.currentThread().getName(),id);
    String result = orderService.getOrder(id);
    int i = successCount.incrementAndGet();
    LOGGER.info("成功次数:{}" , i);
    return result;
}

public String getNameFallback(String id) {
    LOGGER.info("consumer->service->fallback->thread:{},id:{}", Thread.currentThread().getName(),id);
    try {
        Thread.sleep(1000);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    int i = failCount.incrementAndGet();
    LOGGER.info("失败次数==={}" , i);
    return "回退";
}

}
The following is a simulation of scenario and services dependent timeout:
timeout 1, the configuration of the service to call the method 1000 milliseconds
@HystrixProperty (name = "execution.isolation.thread.timeoutInMilliseconds", value = "1000")
Note: This timeout dependent services Dubbo timeouts and whichever is smaller, provided herein Dubbo service timeout is 2000 milliseconds
<dubbo: reference id = "orderService " retries = "0" timeout = "2000" interface = "com.bestpay.service.OrderService" check = "to false" filter = "TraceFilter" />
2, the configuration of the circuit breaker to open threshold% 50
@HystrixProperty (name = "circuitBreaker.errorThresholdPercentage", value = "50")
. 3, the configuration service provider random timeout, over 1000 ms
public class OrderServiceImpl implements OrderService {

private static final Logger LOGGER = LoggerFactory.getLogger(OrderServiceImpl.class);

@Override
public String getOrder(String orderid) {
    LOGGER.info("thread:" + Thread.currentThread().getName());
       try{ boolean flag = new Random().nextBoolean();
        if(flag){
            Thread.sleep(2000);
        }else{
            Thread.sleep(500);
        }
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
    LOGGER.info("provider->service->param:{}", orderid);
    return orderid + ",OK";
}

}
3, start the related services, using concurrent analog Jmeter, Hystrix monitor the status of the circuit breaker.
Here Insert Picture Description
Error Rate () more than 50% of the time, the state of the circuit breaker to open, after the request was downgraded short-dependent service when you call the service, the call to service serves to protect and improve the availability of call.
More than just a blown circuit breaker presentation Hystrix scene, on the situation of isolation Hystrix, self-study.
Note: The error rate is calculated as follows: Timed-out + Threadpool Rejected + Failuer / Total
Here Insert Picture Description

About the Author : Orange worked Finance IT, responsible for server-side development, focusing on micro-services, distributed, performance tuning, high availability, to welcome you peer communication.

Released four original articles · won praise 17 · views 2574

Guess you like

Origin blog.csdn.net/weixin_39178876/article/details/88415338