【线上踩坑】Spring Cloud Hystrix线程池不足

现象:

昨天突然线上很多接口获取失败,通过 kibana发现大量异常,具体异常信息:

...into fallback. Rejected command because thread-pool queueSize is at rejection threshold.

异常代码出处:

@FeignClient(name = "api", fallbackFactory = LoadBalancingFallbackFactory.class)
public interface LoadBalancingFeignClient {

    @PostMapping(value = "/api/loadBalancing/server")
    Result currentServer();

}

@Slf4j
@Component
public class LoadBalancingFallbackFactory implements FallbackFactory<LoadBalancingFeignClient> {

    @Override
    public LoadBalancingFeignClient create(Throwable throwable) {
        final String msg = throwable.getMessage();
        return () -> {
            log.error("loadBalancingFeignClient currentServer into fallback. {}", msg);
            return Result.error();
        };****
    }

}复制代码

原因:

看到这里已经很明显了,是由于hystrix线程池不够用,直接熔断导致的。项目apollo配置:

hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds = 3500
hystrix.threadpool.default.maxQueueSize = 60
hystrix.threadpool.default.queueSizeRejectionThreshold = 40复制代码

hystrix参数简析:

maxQueueSize:线程池大小,默认为-1,创建的队列是SynchronousQueue,如果设置大于0则根据其大小创建LinkedBlockingQueue。

queueSizeRejectionThreshold:动态控制线程池队列的上限,即使maxQueueSize没有达到,达到queueSizeRejectionThreshold该值后,请求也会被拒绝,默认值5

相关源码:

hystrix-core-1.5.12-sources.jar!/com/netflix/hystrix/strategy/concurrency/HystrixContextScheduler.java

    private class HystrixContextSchedulerWorker extends Worker {

        private final Worker worker;

        private HystrixContextSchedulerWorker(Worker actualWorker) {
            this.worker = actualWorker;
        }

        @Override
        public void unsubscribe() {
            worker.unsubscribe();
        }

        @Override
        public boolean isUnsubscribed() {
            return worker.isUnsubscribed();
        }

        @Override
        public Subscription schedule(Action0 action, long delayTime, TimeUnit unit) {
            if (threadPool != null) {
                if (!threadPool.isQueueSpaceAvailable()) {
                    throw new RejectedExecutionException("Rejected command because thread-pool queueSize is at rejection threshold.");
                }
            }
            return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action), delayTime, unit);
        }

        @Override
        public Subscription schedule(Action0 action) {
            if (threadPool != null) {
                if (!threadPool.isQueueSpaceAvailable()) {
                    throw new RejectedExecutionException("Rejected command because thread-pool queueSize is at rejection threshold.");
                }
            }
            return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action));
        }

    }复制代码

解决办法:

  • 适当调大Hystrix线程队列参数
  • 动态水平扩容服务
  • 优化下游服务,减少服务响应时间


【线上踩坑】系列主要是用于简单记录工作中实际遇到的线上问题,如需要更深入的学习和了解,可以直接联系作者。


file


file

猜你喜欢

转载自juejin.im/post/5e3a29e7f265da573255188f