Fusing Hystrix to use early adopters

Fusing Hystrix to use early adopters

When the service has many external dependencies, if one of the services is unavailable, the entire cluster will be affected (such as timeout, causing a large number of requests to be blocked, resulting in the inability of external requests to come in). In this case, hystrix is ​​used. very useful

For this purpose, I learned about the hystrix framework and recorded the new process of the framework.

I. Principle Inquiry

Through the official website and related blog posts, we can briefly talk about this working mechanism. The general process is as follows

The first is to request -> determine whether the circuit breaker is on -> service call -> fallback if abnormal, failure count +1 -> end

The following is the main flow chart

flow chart

graph LR
    A(请求)-->B{熔断器是否已开}
    B --> | 熔断 | D[fallback逻辑]
    B --> | 未熔断 | E[线程池/Semphore]
    E --> F{线程池满/无可用信号量}
    F --> | yes | D
    F --> | no | G{创建线程执行/本线程运行}
    G --> | yes | I(结束)
    G --> | no | D
    D --> I(结束)

The circuit breaker mechanism mainly provides two kinds, one is based on the isolation method of the thread pool; the other is based on the preemption of the semaphore

Thread pool method : support asynchronous, support timeout setting, support current limit

Semaphore mode : This thread executes, no asynchronous, no timeout, supports current limiting, and consumes less

After basically having the above simple concept, we start to enter our usage testing process

II. Use early adopters

1. Introduce dependencies

<dependency>
    <groupId>com.netflix.hystrix</groupId>
    <artifactId>hystrix-core</artifactId>
    <version>1.5.12</version>
</dependency>

2. Simple to use

From the official documentation, two Command methods are supported, one is the ObserverCommand based on the observer mode, and the other is the basic Command, first use a simple look at the following

public class HystrixConfigTest extends HystrixCommand<String> {

    private final String name;

    public HystrixConfigTest(String name, boolean ans) {
//        注意的是同一个任务,
        super(Setter.withGroupKey(
//                CommandGroup是每个命令最少配置的必选参数,在不指定ThreadPoolKey的情况下,字面值用于对不同依赖的线程池/信号区分
                HystrixCommandGroupKey.Factory.asKey("CircuitBreakerTestGroup"))
//                每个CommandKey代表一个依赖抽象,相同的依赖要使用相同的CommandKey名称。依赖隔离的根本就是对相同CommandKey的依赖做隔离.
                        .andCommandKey(HystrixCommandKey.Factory.asKey("CircuitBreakerTestKey_" + ans))
//                当对同一业务依赖做隔离时使用CommandGroup做区分,但是对同一依赖的不同远程调用如(一个是redis 一个是http),可以使用HystrixThreadPoolKey做隔离区分
                        .andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey("CircuitBreakerTest_" + ans))
                        .andThreadPoolPropertiesDefaults(    // 配置线程池
                                HystrixThreadPoolProperties.Setter()
                                        .withCoreSize(12)    // 配置线程池里的线程数,设置足够多线程,以防未熔断却打满threadpool
                        )
                        .andCommandPropertiesDefaults(    // 配置熔断器
                                HystrixCommandProperties.Setter()
                                        .withCircuitBreakerEnabled(true)
                                        .withCircuitBreakerRequestVolumeThreshold(3)
                                        .withCircuitBreakerErrorThresholdPercentage(80)
//                		.withCircuitBreakerForceOpen(true)	// 置为true时,所有请求都将被拒绝,直接到fallback
//                		.withCircuitBreakerForceClosed(true)	// 置为true时,将忽略错误
//                                        .withExecutionIsolationStrategy(HystrixCommandProperties.ExecutionIsolationStrategy.SEMAPHORE)    // 信号量隔离
                                        .withExecutionIsolationSemaphoreMaxConcurrentRequests(20)
                                        .withExecutionTimeoutEnabled(true)
                                        .withExecutionTimeoutInMilliseconds(200)
                                .withCircuitBreakerSleepWindowInMilliseconds(1000) //熔断器打开到关闭的时间窗长度
//                		.withExecutionTimeoutInMilliseconds(5000)
                        )
        );
        this.name = name;
    }

    @Override
    protected String run() throws Exception {
        System.out.println("running run():" + name + " thread: " + Thread.currentThread().getName());
        int num = Integer.valueOf(name);
        if (num % 2 == 0 && num < 10) {    // 直接返回
            return name;
        } else if (num < 40) {
            Thread.sleep(300);
            return "sleep+"+ name;
        } else {    // 无限循环模拟超时
            return name;
        }
    }
//
//    @Override
//    protected String getFallback() {
//        Throwable t = this.getExecutionException();
//        if(t instanceof HystrixRuntimeException) {
//            System.out.println(Thread.currentThread() + " --> " + ((HystrixRuntimeException) t).getFailureType());
//        } else if (t instanceof HystrixTimeoutException) {
//            System.out.println(t.getCause());
//        } else {
//            t.printStackTrace();
//        }
//        System.out.println(Thread.currentThread() + " --> ----------over------------");
//        return "CircuitBreaker fallback: " + name;
//    }

    public static class UnitTest {

        @Test
        public void testSynchronous() throws IOException, InterruptedException {
            for (int i = 0; i < 50; i++) {
                if (i == 41) {
                    Thread.sleep(2000);
                }
                try {
                    System.out.println("===========" + new HystrixConfigTest(String.valueOf(i), i % 2 == 0).execute());
                } catch (HystrixRuntimeException e) {
                    System.out.println(i + " : " + e.getFailureType() + " >>>> " + e.getCause() + " <<<<<");
                } catch (Exception e) {
                    System.out.println("run()抛出HystrixBadRequestException时,被捕获到这里" + e.getCause());
                }
            }

            System.out.println("------开始打印现有线程---------");
            Map<Thread, StackTraceElement[]> map = Thread.getAllStackTraces();
            for (Thread thread : map.keySet()) {
                System.out.println("--->name-->" + thread.getName());
            }
            System.out.println("thread num: " + map.size());

            System.in.read();
        }
    }
}

It is relatively simple to use, the general steps are as follows:

  • Inherit HsytrixCommandclass
  • Overloading the constructor, you need to specify various configurations internally
  • Implement the run method, which mainly implements the method of fuse monitoring

Writing the above code is relatively simple, but there are several places that are not easy to deal with

  1. The specific meaning of the configuration item, and how does it take effect?
  2. What if some exceptions do not enter the fuse logic?
  3. How to obtain monitoring data?
  4. How to simulate various cases (timeout? Service exception? Circuit breaker enabled? Thread pool full? No semaphore available? Retry with half circuit breaker?)

3. Measured understanding

According to the deletion, deletion and modification of the above code, it seems that I understand the following points, I don’t know if it’s right or wrong

a. Configuration related

  • groupKey is used to distinguish thread pools from semaphores, that is, one group corresponds to one
  • commandKey is very important, this is used to distinguish business
    • In simple terms, group is similar to an app that provides services, and command corresponds to the service provided by the app. An app can have multiple services. Here, all requests of an app are placed in a thread pool (or share a semaphore)
  • Turn on the fusing mechanism, specify the minimum number of requests to trigger fusing (within 10s), and specify the conditions for turning on fusing (failure rate)
  • Set the circuit breaker strategy (thread pool or semaphore)
  • Set the retry time (5s after the default fuse is turned on, put a few requests in to see if the service is restored)
  • Set the thread pool size, set the semaphore size, set the queue size
  • Set the timeout time, set the allowable timeout setting

b. Use related

The run method is the core execution service call. If some services are required to not count the failure rate of the fuse (for example, because the calling posture is incorrect, an exception is thrown inside the service, but the service itself is normal), at this time, it needs to be packaged. Calling logic, wrapping unwanted exceptions into HystrixBadRequestExceptionclasses

Such as

@Override
protected String run() {
    try {
        return func.apply(route, parameterDescs);
    } catch (Exception e) {
        if (exceptionExcept(e)) {
            // 如果是不关注的异常case, 不进入熔断逻辑
            throw new HystrixBadRequestException("unexpected exception!", e);
        } else {
            throw e;
        }
    }
}

c. How to get the reason for failure

When a failure occurs, hystrix will wrap the native exception into HystrixRuntimeExceptionthis class, so we can handle it as follows in the calling place

try {
    System.out.println("===========" + new HystrixConfigTest(String.valueOf(i), i % 2 == 0).execute());
} catch (HystrixRuntimeException e) {
    System.out.println(i + " : " + e.getFailureType() + " >>>> " + e.getCause() + " <<<<<");
} catch (Exception e) {
    System.out.println("run()抛出HystrixBadRequestException时,被捕获到这里" + e.getCause());
}

When the fallback logic is defined, the exception will not be thrown to the specific caller, so in the fallback method, it is necessary to obtain the corresponding exception information

// 获取异常信息
Throwable t = this.getExecutionException();

Then the next step is to obtain the corresponding abnormal cause, and use the FailureType to indicate the root cause of the failure

((HystrixRuntimeException) t).getFailureType()

d. How to get statistics

hystrix provides a set of monitoring plug-ins. Basically, everyone will have their own monitoring statistics. Therefore, this data needs to be customized and customized. I have not yet thought of how to handle these statistics gracefully.

4. Summary

The main thing is to see how this thing can be played. The whole feeling of using it is that the design is more interesting, but there are too many configuration parameters, many of which are not fully understood.

Secondly, when some special cases (such as monitoring, alarm, special case filtering) need to be handled, it is not very easy to use. The main problem is that the internal working mechanism of this framework is not clearly understood.

III. Other

Personal blog: Z+|blog

A personal blog based on hexo + github pages, recording all blog posts in study and work, welcome to visit

statement

It is not as good as a letter of faith. The above content is purely from the family. Because of my average ability and limited knowledge, if you find bugs or have better suggestions, you are welcome to criticize and correct them at any time.

Scan attention

QrCode

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325477090&siteId=291194637