Dubbo source code analysis 8: Besides, the Provider thread pool is EXHAUSTED

In the last " Dubbo Source Code Implementation VI ", we have learned that for the Provider role in the Dubbo cluster , there are two thread pools: IO thread pool (default unbounded) and business processing thread pool (default 200 ). When the concurrency is relatively high, or some business processing becomes slow, the business thread pool is easily "full" and a " RejectedExecutionException: Thread pool is EXHAUSTED!" exception is thrown. Of course, the premise is that every time we configure the Provider's thread pool to wait for the Queue.

Since the provider side has thrown an exception, indicating that it can't stand it anymore, but online we found that the consumer is indifferent, and the request sent is still waiting there stupidly until it times out. This is extremely easy to cause an "avalanche" of the entire system, because it violates the fail-fast principle. We hope that once the Provider cannot receive requests because the thread pool is full, the Consumer should immediately sense and fail fast to release the thread. Later, it was found that the Dispatcher was configured incorrectly. The default is all , and we should configure it as message .

Let's take a look at what's going on from the source code point of view. Let's assume that we use the Netty framework to implement IO operations. As we mentioned last time, NettyHandler , NettyServer, MultiMessageHandler, and HeartbeatHandler all implement the Channel Handler interface. Implement operations such as receiving, sending, disconnecting, and exception handling. Currently, the four Handlers mentioned above are all called in sequence in the IO thread pool, but what is the next Handler after HeartbeatHandler is called? At this time, the Dispatcher will come to play. Dispatcher is the scheduler in dubbo, which is used to determine whether the operation is performed in IO or in the business thread pool. Here is an official diagram ( http://dubbo.io/user-guide /demos/threadmodel.html ) : _

The ThreadPool followed by the Dispatcher in the above figure is what we call the business thread pool. Dispatcher is divided into 5 categories, the default is all , the explanation also directly refers to the official screenshot:

Because the default is all, the request, response, connection, disconnection and heartbeat are all handled by the business thread pool, which undoubtedly increases the burden of the business thread pool, because the default is 200 . Each Dispatcher has a corresponding ChannelHandler, and ChannelHandler forms a call chain from the mobilization of the Handler. If all is configured, then AllChannelHandler will be played next; if message is configured, then MessageOnlyChannelHandler will be played next . These ChannelHandlers are all subclasses of WrappedChannelHandler. WrappedChannelHandler defaults to request, response, connection, disconnection, and heartbeat. The operations are handed over to the Handler for processing:

protected final ChannelHandler handler;

public void connected(Channel channel) throws RemotingException {

handler.connected(channel);

}

public void disconnected(Channel channel) throws RemotingException {

handler.disconnected(channel);

}

public void sent(Channel channel, Object message) throws RemotingException {

handler.sent(channel, message);

}

public void received(Channel channel, Object message) throws RemotingException {

handler.received(channel, message);

}

public void caught(Channel channel, Throwable exception) throws RemotingException {

handler.caught(channel, exception);

}

Obviously, if the WrappedChannelHandler method is used directly, the Handler call will be completed in the current thread (here is the IO thread). Let's take a look at the internal implementation of AllChannelHandler:

public void connected(Channel channel) throws RemotingException {

ExecutorService cexecutor = getExecutorService();

try{

cexecutor.execute(new ChannelEventRunnable(channel, handler ,ChannelState.CONNECTED));

}catch (Throwable t) {

throw new ExecutionException("connect event", channel, getClass()+" error when process connected event ." , t);

}

public void disconnected(Channel channel) throws RemotingException {

ExecutorService cexecutor = getExecutorService();

try{

cexecutor.execute(new ChannelEventRunnable(channel, handler ,ChannelState.DISCONNECTED));

}catch (Throwable t) {

throw new ExecutionException("disconnect event", channel, getClass()+" error when process disconnected event ." , t);

}

public void received(Channel channel, Object message) throws RemotingException {

ExecutorService cexecutor = getExecutorService();

try {

cexecutor.execute(new ChannelEventRunnable(channel, handler, ChannelState.RECEIVED, message));

} catch (Throwable t) {

throw new ExecutionException(message, channel, getClass() + " error when process received event .", t);

}

public void caught(Channel channel, Throwable exception) throws RemotingException {

ExecutorService cexecutor = getExecutorService();

try{

cexecutor.execute(new ChannelEventRunnable(channel, handler ,ChannelState.CAUGHT, exception));

}catch (Throwable t) {

throw new ExecutionException("caught event", channel, getClass()+" error when process caught event ." , t);

}

It can be seen that AllChannelHandler covers all the key operations of WrappedChannelHandler , and puts them into ExecutorService (here refers to the business thread pool) for asynchronous processing, but the only thing without asynchronous operation is the send method, which is mainly used for response , but the official document says that when all is used, the response is also placed in the business thread pool. Is it wrong? Here, the key point is here. Once the business thread pool is full, an execution rejection exception will be thrown, and the caught method will be entered for processing. This method still uses the business thread pool, so it is very likely that the business thread pool will still be processed at this time. It was full, so it was a tragedy, which directly caused a downstream HeaderExchangeHandler to have no chance to be called, and the response message after exception handling was completed by HeaderExchangeHandler # caught, so in the end NettyHandler # writeRequested was not called, and the Consumer could only wait until it times out. Unable to receive the Provider's thread pool full exception.

From the above analysis, it can be concluded that when the Dispatcher uses all, once the Provider thread pool is full, the business thread pool is also required for exception handling. If you are lucky at this time and there are idle threads in the business thread pool, then the Consumer will receive The thread pool sent by the Provider is full of exceptions; but it is very likely that the business thread pool is still full at this time, so unfortunately, there are no threads to run in the exception handling and response steps, resulting in the inability to respond to the Consumer . At this time, the Consumer can only wait until the timeout !

This is why we sometimes see thread pool full exceptions in Consumer, and sometimes we see timeout exceptions.

Why can we avoid this problem by setting Dispatcher as message? Look directly at the implementation of MessageOnlyChannelHandler:

public void received(Channel channel, Object message) throws RemotingException {

ExecutorService cexecutor = executor;

if (cexecutor == null || cexecutor.isShutdown()) {

cexecutor = SHARED_EXECUTOR;

}

try {

cexecutor.execute(new ChannelEventRunnable(channel, handler, ChannelState.RECEIVED, message));

} catch (Throwable t) {

throw new ExecutionException(message, channel, getClass() + " error when process received event .", t);

}

That's right, MessageOnlyChannelHandler only overrides the received method of WrappedChannelHandler, which means that only request processing will use the business thread pool, and other non-business operations are executed directly in the IO thread pool, isn't that what we want? So using the message's Dispatcher , there will not be a situation where the Provider thread pool is full, but the Consumer is still waiting, because the default IO thread pool is unbounded, and there must be threads to handle exceptions and responses (if you set it to have world, then I have nothing to say).

Therefore, in order to reduce the risk of avalanche of the entire system when the Provider thread pool is full, it is recommended to set the Dispatcher to message :

<! - StartFragment -> <! - EndFragment ->

<dubbo:protocol name="dubbo" port="8888" threads="500" dispatcher="message" />

Dubbo source code analysis 8: Besides, the Provider thread pool is EXHAUSTED

Guess you like