1.前言
JDK1.4之前的传统阻塞IO(BIO),服务端需要为每一个客户端连接创建单独的线程为其服务,从JDK1.4开始NIO非阻塞式IO出现,它只需要单独的一个线程就能接收多个客户端请求,而真正处理各个请求的细节可以使用多线程的方式高效率的完成,这些处理线程与具体的业务逻辑分离,做到了IO的复用。
2.源码分析
首先以一段典型的NIO使用代码开始:
Selector selector = Selector.open(); ServerSocketChannel ssc = ServerSocketChannel.open(); ssc.configureBlocking(false); ssc.socket().bind(new InetSocketAddress(9527)); ssc.register(selector, SelectionKey.OP_ACCEPT); while(true){ int n = selector.select(); if (n <= 0) continue; Iterator it = selector.selectedKeys().iterator(); while(it.hasNext()){ SelectionKey key = (SelectionKey)it.next(); if (key.isAcceptable()){ SocketChannel sc= ((ServerSocketChannel) key.channel()).accept(); sc.configureBlocking(false); sc.register(key.selector(), SelectionKey.OP_READ|SelectionKey.OP_WRITE); } if (key.isReadable()){ SocketChannel channel = ((SocketChannel) key.channel()); ByteBuffer bf = ByteBuffer.allocate(10); int read = channel.read(bf); System.out.println("read "+read+" : "+new String(bf.array()).trim()); } if (key.isWritable()){ SocketChannel channel = ((SocketChannel) key.channel()); channel.write(ByteBuffer.wrap(new String("hello client").getBytes())); } it.remove(); } }
2.1 Selector.open() 获取选择器。
public static Selector open() throws IOException { return SelectorProvider.provider().openSelector(); } public static SelectorProvider provider() { synchronized (lock) { if (provider != null) return provider; return AccessController.doPrivileged( new PrivilegedAction<SelectorProvider>() { public SelectorProvider run() { if (loadProviderFromProperty()) return provider; if (loadProviderAsService()) return provider; provider = sun.nio.ch.DefaultSelectorProvider.create(); return provider; } }); } }
从Selector源码中可以看到,open方法是交给selectorProvider处理的。 其中provider = sun.nio.ch.DefaultSelectorProvider.create();会根据操作系统来返回不同的实现类,windows平台就返回WindowsSelectorProvider;Linux平台会根据不同的内核版本选择是使用select/poll模式还是epoll模式。
public static SelectorProvider create() { PrivilegedAction pa = new GetPropertyAction("os.name"); String osname = (String) AccessController.doPrivileged(pa); if ("SunOS".equals(osname)) { return new sun.nio.ch.DevPollSelectorProvider(); } // use EPollSelectorProvider for Linux kernels >= 2.6 if ("Linux".equals(osname)) { pa = new GetPropertyAction("os.version"); String osversion = (String) AccessController.doPrivileged(pa); String[] vers = osversion.split("\\.", 0); if (vers.length >= 2) { try { int major = Integer.parseInt(vers[0]); int minor = Integer.parseInt(vers[1]); if (major > 2 || (major == 2 && minor >= 6)) { return new sun.nio.ch.EPollSelectorProvider(); } } catch (NumberFormatException x) { // format not recognized } } } return new sun.nio.ch.PollSelectorProvider(); } sun.nio.ch.EPollSelectorProvider public AbstractSelector openSelector() throws IOException { return new EPollSelectorImpl(this); } sun.nio.ch.PollSelectorProvider public AbstractSelector openSelector() throws IOException { return new PollSelectorImpl(this); }
可以看到,如果Linux内核版本>=2.6则,具体的SelectorProvider为EPollSelectorProvider,否则为默认的PollSelectorProvider,实际上这是在JDK5U9之后才有这样的更新。
public static SelectorProvider create() { return new sun.nio.ch.WindowsSelectorProvider(); } sun.nio.ch.WindowsSelectorProvider public AbstractSelector openSelector() throws IOException { return new WindowsSelectorImpl(this); } WindowsSelectorImpl(SelectorProvider sp) throws IOException { super(sp); pollWrapper = new PollArrayWrapper(INIT_CAP); wakeupPipe = Pipe.open(); wakeupSourceFd = ((SelChImpl)wakeupPipe.source()).getFDVal(); // Disable the Nagle algorithm so that the wakeup is more immediate SinkChannelImpl sink = (SinkChannelImpl)wakeupPipe.sink(); (sink.sc).socket().setTcpNoDelay(true); wakeupSinkFd = ((SelChImpl)sink).getFDVal(); pollWrapper.addWakeupSocket(wakeupSourceFd, 0); } void addWakeupSocket(int fdVal, int index) { putDescriptor(index, fdVal); putEventOps(index, POLLIN); }
接下来,以Windows的实现为准进行分析。在openSelector方法里面实例化WindowsSelectorImpl的过程中,
1).实例化了PollWrapper,pollWrapper用Unsafe类申请一块物理内存,用于存放注册时的socket句柄fdVal和event的数据结构pollfd.
2)Pipe.open()打开一个管道(打开管道的实现后面再看);拿到wakeupSourceFd和wakeupSinkFd两个文件描述符;把唤醒端的文件描述符(wakeupSourceFd)放到pollWrapper里.addWakeupSocket方法将source的POLLIN事件(有数据可读)标识为感兴趣的,当sink端有数据写入时,source对应的文件描述描wakeupSourceFd就会处于就绪状态.
public static Pipe open() throws IOException { return SelectorProvider.provider().openPipe(); } public Pipe openPipe() throws IOException { return new PipeImpl(this); } PipeImpl(final SelectorProvider sp) throws IOException { try { AccessController.doPrivileged(new Initializer(sp)); } catch (PrivilegedActionException x) { throw (IOException)x.getCause(); } } private Initializer(SelectorProvider sp) { this.sp = sp; } public Void run() throws IOException { LoopbackConnector connector = new LoopbackConnector(); connector.run(); ....//省略 } private class LoopbackConnector implements Runnable { @Override public void run() { ServerSocketChannel ssc = null; SocketChannel sc1 = null; SocketChannel sc2 = null; try { // Loopback address InetAddress lb = InetAddress.getByName("127.0.0.1"); assert(lb.isLoopbackAddress()); InetSocketAddress sa = null; for(;;) { // Bind ServerSocketChannel to a port on the loopback // address if (ssc == null || !ssc.isOpen()) { ssc = ServerSocketChannel.open(); ssc.socket().bind(new InetSocketAddress(lb, 0)); sa = new InetSocketAddress(lb, ssc.socket().getLocalPort()); } // Establish connection (assume connections are eagerly // accepted) sc1 = SocketChannel.open(sa); ByteBuffer bb = ByteBuffer.allocate(8); long secret = rnd.nextLong(); bb.putLong(secret).flip(); sc1.write(bb); // Get a connection and verify it is legitimate sc2 = ssc.accept(); bb.clear(); sc2.read(bb); bb.rewind(); if (bb.getLong() == secret) break; sc2.close(); sc1.close(); } // Create source and sink channels source = new SourceChannelImpl(sp, sc1); sink = new SinkChannelImpl(sp, sc2); } catch (IOException e) { try { if (sc1 != null) sc1.close(); if (sc2 != null) sc2.close(); } catch (IOException e2) {} ioe = e; } finally { try { if (ssc != null) ssc.close(); } catch (IOException e2) {} } } } }
通过创建管道的代码分析:创建管道的具体实现方式也是与具体的操作系统紧密相关的,这里以Windows为例,创建了一个PipeImpl对象, AccessController.doPrivileged调用后紧接着会执行initializer的run方法,在run方法里面,windows下的实现是创建两个本地的socketChannel,然后连接(链接的过程通过写一个随机long做两个socket的链接校验),两个socketChannel分别实现了管道的source与sink端。通过查阅资料,而在Linux下则是直接使用操作系统提供的管道。
到这里,Selector.open()就完成了,总结一下,主要完成以下几件事:
1.实例化pollWrapper对象,用于将来存放注册时的socket句柄fdVal和event的数据结构pollfd。
2.根据不同操作系统实现了用于自我唤醒的管道,Windows通过创建一对自己连着自己的socket通道,Linux直接使用系统提供的管道。同时,根据linux的不同内核版本还会选择底层进行事件通知的不同机制select/poll或者epoll。
2.2 serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT);通道注册
public final SelectionKey register(Selector sel, int ops, Object att) throws ClosedChannelException{ synchronized (regLock) { SelectionKey k = findKey(sel); if (k != null) { k.interestOps(ops); k.attach(att); } if (k == null) { // New registration synchronized (keyLock) { if (!isOpen()) throw new ClosedChannelException(); k = ((AbstractSelector)sel).register(this, ops, att); addKey(k); } } return k; } }如果该channel和selector已经注册过,则直接添加事件和附件。否则通过selector实现注册过程。
protected final SelectionKey register(AbstractSelectableChannel ch, int ops, Object attachment) { if (!(ch instanceof SelChImpl)) throw new IllegalSelectorException(); SelectionKeyImpl k = new SelectionKeyImpl((SelChImpl)ch, this); k.attach(attachment); synchronized (publicKeys) { implRegister(k); } k.interestOps(ops); return k; } protected void implRegister(SelectionKeyImpl ski) { synchronized (closeLock) { if (pollWrapper == null) throw new ClosedSelectorException(); growIfNeeded(); channelArray[totalChannels] = ski; ski.setIndex(totalChannels); fdMap.put(ski); keys.add(ski); pollWrapper.addEntry(totalChannels, ski); totalChannels++; } } private void growIfNeeded() { if (channelArray.length == totalChannels) { int newSize = totalChannels * 2; // Make a larger array SelectionKeyImpl temp[] = new SelectionKeyImpl[newSize]; System.arraycopy(channelArray, 1, temp, 1, totalChannels - 1); channelArray = temp; pollWrapper.grow(newSize); } if (totalChannels % MAX_SELECTABLE_FDS == 0) { // more threads needed pollWrapper.addWakeupSocket(wakeupSourceFd, totalChannels); totalChannels++; threadsCount++; } } void addEntry(int index, SelectionKeyImpl ski) { putDescriptor(index, ski.channel.getFDVal()); }通过selector注册的过程主要完成以下几件事:
- 以当前channel和selector为参数,初始化 SelectionKeyImpl 对象,并添加附件attachment。
- 如果当前channel的数量totalChannels等于SelectionKeyImpl数组大小,对SelectionKeyImpl数组和pollWrapper进行扩容操作。
- 如果totalChannels % MAX_SELECTABLE_FDS == 0,则多开一个线程处理selector。windows上select系统调用有最大文件描述符限制,一次只能轮询1024个文件描述符,如果多于1024个,需要多线程进行轮询。
- ski.setIndex(totalChannels)选择键记录下在数组中的索引位置。
- keys.add(ski);将选择键加入到已注册键的集合中。
- fdMap.put(ski);保存选择键对应的文件描述符与选择键的映射关系。
- pollWrapper.addEntry将把selectionKeyImpl中的socket句柄添加到对应的pollfd。
- k.interestOps(ops)方法最终也会把event添加到对应的pollfd。
2.3 selector.select();
public int select() throws IOException { return select(0); } public int select(long timeout) throws IOException { if (timeout < 0) throw new IllegalArgumentException("Negative timeout"); return lockAndDoSelect((timeout == 0) ? -1 : timeout); } private int lockAndDoSelect(long timeout) throws IOException { synchronized (this) { if (!isOpen()) throw new ClosedSelectorException(); synchronized (publicKeys) { synchronized (publicSelectedKeys) { return doSelect(timeout); } } } }当调用selector.select()以及select(0)时,JDK对参数进行修正,其实传给 doSelect 的timeout为-1。当调用的是selectNow()的时候,timeout则为0,直接以负数作为参数则会抛出异常, 其中的doSelector又回到我们的Windows实现:
protected int doSelect(long timeout) throws IOException { if (channelArray == null) throw new ClosedSelectorException(); this.timeout = timeout; // set selector timeout processDeregisterQueue(); if (interruptTriggered) { resetWakeupSocket(); return 0; } // Calculate number of helper threads needed for poll. If necessary // threads are created here and start waiting on startLock adjustThreadsCount(); finishLock.reset(); // reset finishLock // Wakeup helper threads, waiting on startLock, so they start polling. // Redundant threads will exit here after wakeup. startLock.startThreads(); // do polling in the main thread. Main thread is responsible for // first MAX_SELECTABLE_FDS entries in pollArray. try { begin(); try { subSelector.poll(); } catch (IOException e) { finishLock.setException(e); // Save this exception } // Main thread is out of poll(). Wakeup others and wait for them if (threads.size() > 0) finishLock.waitForHelperThreads(); } finally { end(); } // Done with poll(). Set wakeupSocket to nonsignaled for the next run. finishLock.checkForException(); processDeregisterQueue(); int updated = updateSelectedKeys(); // Done with poll(). Set wakeupSocket to nonsignaled for the next run. resetWakeupSocket(); return updated; } private int poll() throws IOException{ // poll for the main thread return poll0(pollWrapper.pollArrayAddress, Math.min(totalChannels, MAX_SELECTABLE_FDS), readFds, writeFds, exceptFds, timeout); }processDeregisterQueue方法主要是对已取消的键集合进行处理,通过调用cancel()方法将选择键加入已取消的键集合中,该方法将会从channelArray中移除对应的通道,调整通道数和线程数,从map和keys中移除选择键,移除通道上的选择键并关闭通道。同时还发现该方法在调用poll方法前后都进行调用,这是确保能够正确处理在调用poll方法阻塞的这一段时间之内取消的键能被及时清理。
adjustThreadsCount方法类似与前面的线程数调整,针对操作系统的最大select操作的文件描述符限制对线程个数进行调整。
subSelector.poll() 是select的核心,由native函数poll0实现,并把pollWrapper.pollArrayAddress作为参数传给poll0,readFds、writeFds 和exceptFds数组用来保存底层select的结果,数组的第一个位置都是存放发生事件的socket的总数,其余位置存放发生事件的socket句柄fd。
WindowsSelectorImpl.c ---- Java_sun_nio_ch_WindowsSelectorImpl_00024SubSelector_poll0(JNIEnv *env, jobject this, jlong pollAddress, jint numfds, jintArray returnReadFds, jintArray returnWriteFds, jintArray returnExceptFds, jlong timeout) { static struct timeval zerotime = {0, 0}; if (timeout == 0) { tv = &zerotime; } else if (timeout < 0) { tv = NULL; } else { tv = &timevalue; tv->tv_sec = (long)(timeout / 1000); tv->tv_usec = (long)((timeout % 1000) * 1000); } // 代码.... 此处省略 /* Call select */ if ((result = select(0 , &readfds, &writefds, &exceptfds, tv)) == SOCKET_ERROR) { /* Bad error - this should not happen frequently */ /* Iterate over sockets and call select() on each separately */ // 代码.... 此处省略 for (i = 0; i < numfds; i++) { /* prepare select structures for the i-th socket */ // 代码.... 此处省略 /* call select on the i-th socket */ if (select(0, &errreadfds, &errwritefds, &errexceptfds, &zerotime) == SOCKET_ERROR) { //代码....此处省略 } } } }
通过这一段调用C语言的poll0实现(这段代码主要意义在于调用了select函数,其他逻辑只是针对发生SOCKET_ERROR错误的时候,对每一个socket进行了单独的select调用),我们可以看到,Windows调用了底层的select函数,这里的select就是轮询pollArray中的FD,看有没有事件发生,如果有事件发生收集所有发生事件的FD,退出阻塞。当调用selector.select()以及select(0)时,JDK对参数进行修正,其实传给底层poll0的timeout为-1。当调用的是selectNow()的时候,timeout则为0,直接以负数作为参数则会抛出异常,当传给底层select的参数tv为0时立即返回,为NULL时将会无限期阻塞直到事件发生。
最后一步调用updateSelectedKeys。这个方法完成了选择键的更新,具体实现:
private int updateSelectedKeys() { updateCount++; int numKeysUpdated = 0; numKeysUpdated += subSelector.processSelectedKeys(updateCount); for (SelectThread t: threads) { numKeysUpdated += t.subSelector.processSelectedKeys(updateCount); } return numKeysUpdated; } //以上对主线程和各个helper线程(因为最大文件句柄数限制作出线程调整创建的线程)都调用了 processSelectedKeys方法。 private int processSelectedKeys(long updateCount) { int numKeysUpdated = 0; numKeysUpdated += processFDSet(updateCount, readFds, PollArrayWrapper.POLLIN, false); numKeysUpdated += processFDSet(updateCount, writeFds, PollArrayWrapper.POLLCONN | PollArrayWrapper.POLLOUT, false); numKeysUpdated += processFDSet(updateCount, exceptFds, PollArrayWrapper.POLLIN | PollArrayWrapper.POLLCONN | PollArrayWrapper.POLLOUT, true); return numKeysUpdated; } //processSelectedKeys方法分别对读选择键集、写选择键集,异常选择键集调用了processFDSet方法 private int processFDSet(long updateCount, int[] fds, int rOps, boolean isExceptFds) { int numKeysUpdated = 0; for (int i = 1; i <= fds[0]; i++) { int desc = fds[i]; if (desc == wakeupSourceFd) { synchronized (interruptLock) { interruptTriggered = true; } continue; } MapEntry me = fdMap.get(desc); // If me is null, the key was deregistered in the previous // processDeregisterQueue. if (me == null) continue; SelectionKeyImpl sk = me.ski; // The descriptor may be in the exceptfds set because there is // OOB data queued to the socket. If there is OOB data then it // is discarded and the key is not added to the selected set. if (isExceptFds && (sk.channel() instanceof SocketChannelImpl) && discardUrgentData(desc)) { continue; } if (selectedKeys.contains(sk)) { // Key in selected set if (me.clearedCount != updateCount) { if (sk.channel.translateAndSetReadyOps(rOps, sk) && (me.updateCount != updateCount)) { me.updateCount = updateCount; numKeysUpdated++; } } else { // The readyOps have been set; now add if (sk.channel.translateAndUpdateReadyOps(rOps, sk) && (me.updateCount != updateCount)) { me.updateCount = updateCount; numKeysUpdated++; } } me.clearedCount = updateCount; } else { // Key is not in selected set yet if (me.clearedCount != updateCount) { sk.channel.translateAndSetReadyOps(rOps, sk); if ((sk.nioReadyOps() & sk.nioInterestOps()) != 0) { selectedKeys.add(sk); me.updateCount = updateCount; numKeysUpdated++; } } else { // The readyOps have been set; now add sk.channel.translateAndUpdateReadyOps(rOps, sk); if ((sk.nioReadyOps() & sk.nioInterestOps()) != 0) { selectedKeys.add(sk); me.updateCount = updateCount; numKeysUpdated++; } } me.clearedCount = updateCount; } } return numKeysUpdated; }
通过以上代码分析:
1、忽略wakeupSourceFd,这个文件描述符用于唤醒用的,与用户具体操作无关,所以忽略;
2、过滤fdMap中不存在的文件描述符,因为已被注销;
3、忽略oob data(搜了一下:out of band data指带外数据,有时也称为加速数据, 是指连接双方中的一方发生重要事情,想要迅速地通知对方 ),这也不是用户关心的;
4、如果通道的键还没有处于已选择的键的集合中,那么键的ready集合将被清空,然后表示操作系统发现的当前通道已经准备好的操作的比特掩码将被设置;
5、如果键在已选择的键的集合中。操作系统发现的当前已经准备好的操作的比特掩码将会被更新进ready集合,而对已经存在的任何结果集不做清除处理。
2.4 wakeup
public Selector wakeup() { synchronized (interruptLock) { if (!interruptTriggered) { setWakeupSocket(); interruptTriggered = true; } } return this; } // Sets Windows wakeup socket to a signaled state. private void setWakeupSocket() { setWakeupSocket0(wakeupSinkFd); } private native void setWakeupSocket0(int wakeupSinkFd); //WindowsSelectorImpl.c JNIEXPORT void JNICALL Java_sun_nio_ch_WindowsSelectorImpl_setWakeupSocket0(JNIEnv *env, jclass this, jint scoutFd) { /* Write one byte into the pipe */ send(scoutFd, (char*)&POLLIN, 1, 0); }
如果线程正阻塞在select方法上,调用wakeup方法会使阻塞的选择操作立即返回,通过以上Windows的实现其实是向pipe的sink端写入了一个字节,source文件描述符就会处于就绪状态,poll方法会返回,从而导致select方法返回。而在其他solaris或者linux系统上其实采用系统调用pipe来完成管道的创建,相当于直接用了系统的管道。通过以上代码还可以看出,调用wakeup设置了interruptTriggered的标志位,所以连续多次调用wakeup的效果等同于一次调用。