Jingdong architect-----Analysis of Hystrix thread isolation technology

Meet Hystrix

Hystrix is ​​a fault-tolerant framework open sourced by Netflix, which includes common fault-tolerant methods: thread isolation, semaphore isolation, degradation strategy, and circuit breaker technology.
Under high concurrent access, the stability of the services that the system depends on has a great impact on the system, and there are many uncontrollable factors that depend on the system, such as slow network connections, suddenly busy resources, temporarily unavailable, and offline services. If we want to build a stable and reliable distributed system, we must have such a set of fault-tolerant methods.
This article mainly discusses thread isolation techniques.

Why do thread isolation

For example, we now have three business calls: order query, commodity query, and user query, and these three business requests all rely on third-party services - order service, commodity service, and user service. All three services are called via RPC. When querying the order service, if the thread is blocked, and a large number of query order requests will come in at this time, the number of threads in the container will continue to increase until the CPU resources are exhausted to 100%, and the entire service is unavailable to the outside world. In a cluster environment It's an avalanche. As shown below

 

Order service unavailable.png

The entire tomcat container is unavailable.png

How Hystrix achieves thread isolation through thread pools

Hystrix encapsulates each type of business request into a corresponding command request through the command mode, such as query order->order command, query commodity-> commodity Command, query user->user Command. Each type of Command corresponds to a thread pool. The created thread pool is put into the ConcurrentHashMap, such as querying the order:

final static ConcurrentHashMap<String, HystrixThreadPool> threadPools = new ConcurrentHashMap<String, HystrixThreadPool>();
threadPools.put(“hystrix-order”, new HystrixThreadPoolDefault(threadPoolKey, propertiesBuilder));

When the second query order request comes, the thread pool can be obtained directly from the Map. The specific process is as follows:

hystrix thread execution process and asynchrony.png

The method of creating a thread in the thread pool, view the source code as follows:

public ThreadPoolExecutor getThreadPool(final HystrixThreadPoolKey threadPoolKey, HystrixProperty<Integer> corePoolSize, HystrixProperty<Integer> maximumPoolSize, HystrixProperty<Integer> keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue) {
    ThreadFactory threadFactory = null;
    if (!PlatformSpecific.isAppEngineStandardEnvironment()) {
        threadFactory = new ThreadFactory() {
            protected final AtomicInteger threadNumber = new AtomicInteger(0);

            @Override
            public Thread newThread(Runnable r) {
                Thread thread = new Thread(r, "hystrix-" + threadPoolKey.name() + "-" + threadNumber.incrementAndGet());
                thread.setDaemon(true);
                return thread;
            }

        };
    } else {
        threadFactory = PlatformSpecific.getAppEngineThreadFactory();
    }

    final int dynamicCoreSize = corePoolSize.get();
    final int dynamicMaximumSize = maximumPoolSize.get();

    if (dynamicCoreSize > dynamicMaximumSize) {
        logger.error("Hystrix ThreadPool configuration at startup for : " + threadPoolKey.name() + " is trying to set coreSize = " +
                dynamicCoreSize + " and maximumSize = " + dynamicMaximumSize + ".  Maximum size will be set to " +
                dynamicCoreSize + ", the coreSize value, since it must be equal to or greater than the coreSize value");
        return new ThreadPoolExecutor(dynamicCoreSize, dynamicCoreSize, keepAliveTime.get(), unit, workQueue, threadFactory);
    } else {
        return new ThreadPoolExecutor(dynamicCoreSize, dynamicMaximumSize, keepAliveTime.get(), unit, workQueue, threadFactory);
    }
}

There are four ways to execute Command, the specific differences are as follows:

  • execute(): Execute run() in a synchronous blocking manner. After calling execute(), hystrix first creates a new thread to run run(), and then the calling program is blocked at the call of execute() until run() is completed.

  • queue(): Execute run() in an asynchronous non-blocking manner. Calling queue() directly returns a Future object, and hystrix creates a new thread to run run(), the calling program gets the return result of run() through Future.get(), and Future.get() blocks execution.

  • observe(): Execute run()/construct() before event registration. The first step is to call observe() to automatically trigger the execution of run()/construct() before event registration (if HystrixCommand is inherited, hystrix will create a new thread to execute run() non-blocking; if HystrixObservableCommand is inherited, it will be executed with The calling program thread blocks the execution of construct()). The second step is to call subscribe() after returning from observe() to complete event registration. If run()/construct() is successfully executed, onNext() and onCompleted() are triggered. Triggers onError() if the execution is abnormal.

  • toObservable(): Execute run()/construct() after the event is registered. The first step is to call toObservable() to directly return an Observable<String> object before event registration. The second step is to call subscribe() to automatically trigger the execution of run()/construct() after event registration (if HystrixCommand is inherited, hystrix will create a new thread to execute run() non-blocking, and the calling program does not have to wait for run(); if it inherits HystrixObservableCommand, it will block the execution of construct() with the calling program thread, and the calling program will wait for the completion of construct() before continuing to go down. ), if run()/construct() is executed successfully, onNext() and onCompleted() will be triggered, and if the execution is abnormal, onError() will be triggered.
    Note:
    execute() and queue() are in HystrixCommand, observe() and toObservable( ) is in HystrixObservableCommand. In terms of the underlying implementation, HystrixCommand is actually implemented using Observable (looking at the Hystrix source code, you can find that RxJava is used a lot), although it only returns a single result. The queue method of HystrixCommand actually calls toObservable().toBlocking().toFuture(), and the execute method actually calls queue().get().

How to apply to real code

package myHystrix.threadpool;

import com.netflix.hystrix.*;
import org.junit.Test;

import java.util.List;
import java.util.concurrent.Future;

/**
 * Created by wangxindong on 2017/8/4.
 */
public class GetOrderCommand extends HystrixCommand<List> {

    OrderService orderService;

    public GetOrderCommand(String name){
        super(Setter.withGroupKey(HystrixCommandGroupKey.Factory.asKey("ThreadPoolTestGroup"))
                .andCommandKey(HystrixCommandKey.Factory.asKey("testCommandKey"))
                .andThreadPoolKey(HystrixThreadPoolKey.Factory.asKey(name))
                .andCommandPropertiesDefaults(
                        HystrixCommandProperties.Setter()
                                .withExecutionTimeoutInMilliseconds(5000)
                )
                .andThreadPoolPropertiesDefaults(
                        HystrixThreadPoolProperties.Setter()
                                .withMaxQueueSize(10)   //配置队列大小
                                .withCoreSize(2)    // 配置线程池里的线程数
                )
        );
    }

    @Override
    protected List run() throws Exception {
        return orderService.getOrderList();
    }

    public static class UnitTest {
        @Test
        public void testGetOrder(){
//            new GetOrderCommand("hystrix-order").execute();
            Future<List> future =new GetOrderCommand("hystrix-order").queue();
        }

    }
}

Summarize

The thread that executes the dependent code is separated from the request thread (such as the Tomcat thread), and the request thread can freely control the time to leave. This is also what we usually call asynchronous programming. Hystrix is ​​asynchronous programming combined with RxJava. By setting the size of the thread pool to control the amount of concurrent access, when the threads are saturated, the service can be denied to prevent the spread of dependency problems.

thread isolation.png

 

Advantages of thread isolation:
[1]: The application will be completely protected, even if the thread pool of a dependent service is full, it will not affect other parts of the application.
[2]: When we introduce a new low-risk client lib to the application, if a problem occurs, it is also in this lib and will not affect other content, so we can boldly introduce a new lib.
[3]: When a failed service on which it depends comes back to normal, the application will immediately resume normal performance.
[4]: If some parameters of our application are misconfigured, the running status of the thread pool will be displayed quickly, such as delay, timeout, rejection, etc. At the same time, it is possible to process and correct incorrect parameter configurations through real-time execution of dynamic properties.
[5]: If the performance of the service changes and needs to be adjusted, such as increasing or decreasing the timeout period and changing the number of retries, it can be modified through the dynamic properties of the thread pool indicator without affecting other calling requests.
[6]: In addition to the isolation advantage, hystrix has a dedicated thread pool that provides built-in concurrency functions, making it possible to build an asynchronous appearance pattern on top of synchronous calls, so that asynchronous programming can be easily done (Hystrix introduces Rxjava asynchronous frame).

Although thread pools provide thread isolation, our client-side low-level code must also have a timeout setting, and cannot block indefinitely so that the thread pool is always saturated.

Disadvantages of thread isolation:
[1]: The main disadvantage of thread pool is that it increases the computational overhead. When each business request (packaged into a command) is executed, it will involve request queuing, scheduling and context switching. Internally, however, Netflix believes the thread isolation overhead is small enough to not have a significant cost or performance impact.

The Netflix API processes 10+ billion Hystrix Command executions per day using thread isolation. Each API instance has 40+ thread-pools with 5–20 threads in each (most are set to 10).
Netflix API processes 1 billion times per day using thread isolation Hystrix Command executes. Each API instance has 40+ thread pools with 5-20 threads in each thread pool (most are set to 10).

For services that do not depend on network access, such as only relying on memory cache, it is not suitable to use thread pool isolation technology, but semaphore isolation, which will be introduced in a later article.

Therefore, we can safely use Hystrix's thread isolation technology to prevent the terrible fatal online failure of avalanche.

Here I recommend an architecture learning exchange group to everyone. Communication and learning group number: 744642380, which will share some videos recorded by senior architects: Spring, MyBatis, Netty source code analysis, high concurrency, high performance, distributed, microservice architecture principles, JVM performance optimization, distributed architecture Wait for these to become the necessary knowledge system for architects. You can also receive free learning resources, which are currently benefiting

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326363799&siteId=291194637