thrift java.io.IOException: Connection reset by peer引发的oom

server 退出前的异常log信息:

[2015-06-20 10:45:56,713 WARN] Got an IOException in internalRead!
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:142)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:539)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:338)
        at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545)
[2015-06-20 10:46:08,110 ERROR] run() exiting due to uncaught error
java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:371)
        at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:203)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.select(TThreadedSelectorServer.java:590)
        at org.apache.thrift.server.TThreadedSelectorServer$SelectorThread.run(TThreadedSelectorServer.java:545)

一个thrift rpc 请求接收过程如下:(参考中文注释)

/**
     * Give this FrameBuffer a chance to read. The selector loop should have
     * received a read event for this FrameBuffer.
     * 
     * @return true if the connection should live on, false if it should be
     *         closed
     */
    public boolean read() {
      if (state_ == FrameBufferState.READING_FRAME_SIZE) {
        // try to read the frame size completely 
		// 读取 frame 头,即frame 的size,根据 thrift 协议定义,应该是请求体前4个字节
        if (!internalRead()) { 
          return false;
        }

        // if the frame size has been read completely, then prepare to read the
        // actual frame.
        if (buffer_.remaining() == 0) {
          // pull out the frame size as an integer.
          int frameSize = buffer_.getInt(0);
          if (frameSize <= 0) {
            LOGGER.error("Read an invalid frame size of " + frameSize
                + ". Are you using TFramedTransport on the client side?");
            return false;
          }

          // if this frame will always be too large for this server, log the
          // error and close the connection.
		  // frame 长度异常判断
          if (frameSize > MAX_READ_BUFFER_BYTES) {   
            LOGGER.error("Read a frame size of " + frameSize
                + ", which is bigger than the maximum allowable buffer size for ALL connections.");
            return false;
          }

          // if this frame will push us over the memory limit, then return.
          // with luck, more memory will free up the next time around.
		   // frame 长度异常判断
          if (readBufferBytesAllocated.get() + frameSize > MAX_READ_BUFFER_BYTES) {
            return true; 
          }

          // increment the amount of memory allocated to read buffers
          readBufferBytesAllocated.addAndGet(frameSize + 4); 

          // reallocate the readbuffer as a frame-sized buffer
		  // 为 frame 分配足够大内存空间, 从 log 打印信息来看 server 就是在这个调用点 oom 了
          buffer_ = ByteBuffer.allocate(frameSize + 4);  
          buffer_.putInt(frameSize);

          state_ = FrameBufferState.READING_FRAME;
        } else {
          // this skips the check of READING_FRAME state below, since we can't
          // possibly go on to that state if there's data left to be read at
          // this one.
          return true;
        }
      }

      // it is possible to fall through from the READING_FRAME_SIZE section
      // to READING_FRAME if there's already some frame data available once
      // READING_FRAME_SIZE is complete.

      if (state_ == FrameBufferState.READING_FRAME) {
		// 读取 frame 体
        if (!internalRead()) {
          return false; 
        }

        // since we're already in the select loop here for sure, we can just
        // modify our selection key directly.
        if (buffer_.remaining() == 0) {
          // get rid of the read select interests
          selectionKey_.interestOps(0);
          state_ = FrameBufferState.READ_FRAME_COMPLETE;
        }

        return true;
      }

      // if we fall through to this point, then the state must be invalid.
      LOGGER.error("Read was called but state is invalid (" + state_ + ")");
      return false;
    } 
	
    /**
     * Perform a read into buffer.
     * 
     * @return true if the read succeeded, false if there was an error or the
     *         connection closed.
     */
	 // 读 SocketChannel
    private boolean internalRead() {
      try {
        if (trans_.read(buffer_) < 0) {
          return false;
        }
        return true;
      } catch (IOException e) {
        LOGGER.warn("Got an IOException in internalRead!", e);
        return false;
      }
    }

代码分析到这里问题就可以定位到是 frame size(MAX_READ_BUFFER_BYTES) 过大导致的异常退出。

 public static abstract class AbstractNonblockingServerArgs<T extends AbstractNonblockingServerArgs<T>> extends AbstractServerArgs<T> {
	  // 默认是 Long.MAX_VALUE,肯定 OOM
    public long maxReadBufferBytes = Long.MAX_VALUE;

    public AbstractNonblockingServerArgs(TNonblockingServerTransport transport) {
      super(transport);
      transportFactory(new TFramedTransport.Factory());
    }
  }
  
 /**
   * The maximum amount of memory we will allocate to client IO buffers at a
   * time. Without this limit, the server will gladly allocate client buffers
   * right into an out of memory exception, rather than waiting.
   */
   // 该注释指出了问题所在
  final long MAX_READ_BUFFER_BYTES;

  // 可以通过构造函数设置MAX_READ_BUFFER_BYTES
  public AbstractNonblockingServer(AbstractNonblockingServerArgs args) {
    super(args);
    MAX_READ_BUFFER_BYTES = args.maxReadBufferBytes;
  }

问题复现(验证)

  1. echo “something”    nc 到 thrift 端口
  2. telnet 到 thrift 端口,随意敲几个字符

thrift server log中打印几行错误信息后即退出。

原因:

异常的客户端连接到 thrift server 端口,随机的发送数据导致 thrift协议解析到异常大的 frameSize。(内网攻击)

解决办法:

使用 TFramedTransport,设置初始化函数中maxLength,默认值DEFAULT_MAX_LENGTH = 16384000;
其他请务必设置AbstractNonblockingServerArgs.maxReadBufferBytes,默认值Long.MAX_VALUE;

例如:

服务端修改

客户端

参考:

http://www.concurrent.work/java/thrift/case/one-java-thrift-case-caused-by-default-parameters/

http://www.voidcn.com/article/p-nvaknkpy-bpn.html

猜你喜欢

转载自blog.csdn.net/liuxiao723846/article/details/86025590