Netty source -Selector.select bug fix implement

1 Overview

I believe understood the Java NIO Selector.selectexisting bug may cause selectearly return in the event and not ready, and we usually called in a loop selectmethod, and then the cycle will result in idle.

Netty in NioEventLoopthe consideration of this issue and by selectnot return to normal method (annotated source code Netty called prematurely, that is, return in advance) to re-create a new more than a certain number of times Selectorto fix this bug.

2 configuration

Netty provides configuration parameters io.netty.selectorAutoRebuildThresholdfor user-defined selectto create a new Selectorthreshold number of early return, beyond which the number will be triggered Selectorautomatically rebuild the default is 512.

However, if the specified io.netty.selectorAutoRebuildThresholdless than 3 is considered closed this function in Netty.

3 The principle

Netty for Selector.selectdetection and processing logic returns ahead mainly in NioEventLoop.selectthe method:

//NioEventLoop
private void select(boolean oldWakenUp) throws IOException {
    Selector selector = this.selector;
    try {
        //计数器置0
        int selectCnt = 0;
        long currentTimeNanos = System.nanoTime();
        //根据注册的定时任务,获取本次select的阻塞时间
        long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
        for (;;) {
            //每次循环迭代都重新计算一次select的可阻塞时间
            long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
            //如果可阻塞时间为0,表示已经有定时任务快要超时
            //此时如果是第一次循环(selectCnt=0),则调用一次
            //selector.selectNow,然后退出循环返回
            //selectorNow方法的调用主要是为了尽可能检测
            //出准备好的网络事件进行处理
            if (timeoutMillis <= 0) {
                if (selectCnt == 0) {
                    selector.selectNow();
                    selectCnt = 1;
                }
                break;
            }

            // If a task was submitted when wakenUp value was true, the task didn't get a chance to call
            // Selector#wakeup. So we need to check task queue again before executing select operation.
            // If we don't, the task might be pended until select operation was timed out.
            // It might be pended until idle timeout if IdleStateHandler existed in pipeline.
            //如果没有定时任务超时,但是有以前注册的任务(这里不限定
            //是定时任务),且成功设置wakenUp为true,则调用
            //selectNow并返回
            if (hasTasks() && wakenUp.compareAndSet(false, true)) {
                selector.selectNow();
                selectCnt = 1;
                break;
            }
            //调用select方法,阻塞时间为上面算出的最近一个将要超时的
            //定时任务时间
            int selectedKeys = selector.select(timeoutMillis);
            //计数器加1
            selectCnt ++;

            //selectedKeys != 0:如果返回的准备好时间的selectedKeys个
            //数不为0表示这次是因为确实有事件准备好的正常返回
            //oldWakenUp:表示进来时,已经有其他地方对selector进行了
            //唤醒操作
            //wakenUp.get():也表示selector被唤醒
            //hasTasks() || hasScheduledTasks():表示有任务或
            //定时任务要执行
            //发生以上几种情况任一种则直接返回
            if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks() || hasScheduledTasks()) {
                // - Selected something,
                // - waken up by user, or
                // - the task queue has a pending task.
                // - a scheduled task is ready for processing
                break;
            }
            //如果线程被中断,计数器置零,直接返回
            if (Thread.interrupted()) {
                // Thread was interrupted so reset selected keys and break so we not run into a busy loop.
                // As this is most likely a bug in the handler of the user or it's client library we will
                // also log it.
                //
                // See https://github.com/netty/netty/issues/2426
                if (logger.isDebugEnabled()) {
                    logger.debug("Selector.select() returned prematurely because " +
                            "Thread.currentThread().interrupt() was called. Use " +
                            "NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
                }
                selectCnt = 1;
                break;
            }
            //这里判断select返回是否是因为计算的超时时间
            //已过,这种情况下也属于正常返回,计数器置1
            //进入下次循环
            long time = System.nanoTime();
            if (time - TimeUnit.MILLISECONDS.toNanos(timeoutMillis) >= currentTimeNanos) {
                // timeoutMillis elapsed without anything selected.
                selectCnt = 1;
            } else if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
                    selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
                //进入这个分支,表示启用了select bug修复机制,即
                //配置的io.netty.selectorAutoRebuildThreshold
                //参数大于3,且上面select方法提前返回次数已经大于
                //配置的阈值,则会触发selector重建
                // The selector returned prematurely many times in a row.
                // Rebuild the selector to work around the problem.
                logger.warn(
                        "Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
                        selectCnt, selector);
                //进行selector重建
                rebuildSelector();
                selector = this.selector;
                //重建完之后,尝试调用非阻塞版本select一次,
                //并直接返回
                // Select again to populate selectedKeys.
                selector.selectNow();
                selectCnt = 1;
                break;
            }

            currentTimeNanos = time;
        }

        //这种是对于关闭select bug修复机制的程序的处理,
        //简单记录日志,便于排查问题
        if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
            if (logger.isDebugEnabled()) {
                logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
                        selectCnt - 1, selector);
            }
        }
    } catch (CancelledKeyException e) {
        if (logger.isDebugEnabled()) {
            logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
                    selector, e);
        }
        // Harmless exception - log anyway
    }
}

Call the above rebuildSelectorsource code as follows:

//NioEventLoop
/**
* Replaces the current {@link Selector} of this event loop with newly created {@link Selector}s to work
* around the infamous epoll 100% CPU bug.
*/
public void rebuildSelector() {
    //如果不在该线程中,则放到任务队列中
    if (!inEventLoop()) {
        execute(new Runnable() {
            @Override
            public void run() {
                rebuildSelector0();
            }
        });
        return;
    }
    //否则表示在该线程中,直接调用实际重建方法
    rebuildSelector0();
}

private void rebuildSelector0() {
    final Selector oldSelector = selector;
    final SelectorTuple newSelectorTuple;

    //如果旧的selector为空,则直接返回
    if (oldSelector == null) {
        return;
    }

    try {
        //新建一个新的selector
        newSelectorTuple = openSelector();
    } catch (Exception e) {
        logger.warn("Failed to create a new Selector.", e);
        return;
    }

    //对于注册在旧selector上的所有key,依次重新在新建的
    //selecor上重新注册一遍
    // Register all channels to the new Selector.
    int nChannels = 0;
    for (SelectionKey key: oldSelector.keys()) {
        Object a = key.attachment();
        try {
            if (!key.isValid() || key.channel().keyFor(newSelectorTuple.unwrappedSelector) != null) {
                continue;
            }

            int interestOps = key.interestOps();
            key.cancel();
            SelectionKey newKey = key.channel().register(newSelectorTuple.unwrappedSelector, interestOps, a);
            if (a instanceof AbstractNioChannel) {
                // Update SelectionKey
                ((AbstractNioChannel) a).selectionKey = newKey;
            }
            nChannels ++;
        } catch (Exception e) {
            logger.warn("Failed to re-register a Channel to the new Selector.", e);
            if (a instanceof AbstractNioChannel) {
                AbstractNioChannel ch = (AbstractNioChannel) a;
                ch.unsafe().close(ch.unsafe().voidPromise());
            } else {
                @SuppressWarnings("unchecked")
                NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                invokeChannelUnregistered(task, key, e);
            }
        }
    }

    //将该NioEventLoop关联的selector赋值为新建的selector
    selector = newSelectorTuple.selector;
    unwrappedSelector = newSelectorTuple.unwrappedSelector;

    try {
        //关闭旧的selector
        // time to close the old selector as everything else is registered to the new one
        oldSelector.close();
    } catch (Throwable t) {
        if (logger.isWarnEnabled()) {
            logger.warn("Failed to close the old Selector.", t);
        }
    }

    logger.info("Migrated " + nChannels + " channel(s) to the new Selector.");
}

Guess you like

Origin blog.csdn.net/weixin_34236869/article/details/90884552