An analysis of the Netty heap memory leak caused by a wrong decoding process

The cause of the problem was a slow memory leak caused by an error handling of the online Tcp Proxy proxy logic processing. The phenomenon was that the RSS of the process where the Netty service was located slowly increased to a high point and then remained at a high point. According to the existing application forwarding data statistics, the number of uplink and downlink message interactions per day is indeed very high. At that time, a wrong idea was that the way Netty used the off-heap memory pool would lead to an increase in RSS. Wrong judgment will lead to wrong processing results, so it is necessary to find the real cause of RSS increase.

1. Add jvm parameters

-XX:NativeMemoryTracking=detail

-Dio.netty.leakDetectionLevel=advanced

(1) The parameter NativeMemoryTracking is used to track memory usage based on the memory report management method to view the value of memory growth before and after

jcmd <pid> VM.native_memory baseline
jcmd <pid> VM.native_memory 

对于jvm 内存跟踪的报告详细解释网上有很多这里不再进行重复说明,通过对两次时间点的分析发现Internal区使用内存很大可以判断是由于堆外内存分配导致的,目前只能初略判断是由于堆外内存增长导致的不能确定具体原因。

(2) io.netty.leakDetectionLevel is used to print the report of Netty heap memory leak.

通过开启Netty内存泄漏报告来分析内存泄漏点即使用allocate分配的内存在哪里没有释放会有详细的堆栈信息打印。

Usually, the point of Netty memory leak can be judged by the above two methods, but sometimes we need to judge the specific storage content of the off-heap memory to analyze the reason again.

2. Use pmap to analyze off-heap memory leaks

pmap分析内存泄漏的方法网上有很多文章介绍了详细的使用教程,这里只说一下分析思路和试用场景,通过基于pmap的分析是基于内存段找到RSS最大的内存段后再使用gdb dump导出最大内存段来分析内存存储内容,这种办法我们对于常规的分析还是有所帮助的能让我们通过关键信息找到RSS最大内存段里面存储的关键信息。如果网络数据包使用的是加密方式传输会无法通过常规的strings查看十六进制内容来分析存储的具体数据。

The method for troubleshooting memory leaks is provided above. Let me talk about the Netty extra-heap memory leaks caused by the use of wrong logic processing.

protocol format

data pack protocol structure
heartbeat 0x00,0x00
business data Content length + PB serialized content

Use ProtobufVarint32FrameDecoder in Netty to process PB protocol (protocol length + PB serialized content)

Netty process

graph TD
继承ChannelInboundHandlerAdapter实现心跳过滤 --> ProtobufVarint32FrameDecoder

This seemingly no-problem logical processing error in usage leads to a leak of off-heap memory. Let's take a look at the process of inheriting ChannelInboundHandlerAdapter

public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        ByteBuf buffer=null;
        if(msg instanceof ByteBuf){
            buffer=(ByteBuf) msg;
            int size = buffer.readableBytes();  
            if(size>=2){
                byte b1 = buffer.getByte(0);
                byte b2 = buffer.getByte(1);
                if (b1 == 0x00 && b2 == 0x00) {
                    ByteBuf heartBeat=buffer.readBytes(2);
                    heartBeat.release();       
                    int remSize=buffer.readableBytes();
                    if(remSize>0){
                        super.channelRead(ctx, buffer);
                    }
                    return;
                }
            }else{
                return;
            }
        }
    }

The above process seems to be no problem, but why does the RSS rise after heartBeat.release()?

buffer.readBytes() actually re-allocates a piece of memory in buf. Although release is used to release it, this piece of memory is newly allocated. The original Buffer readerindex has been moved but the original data has not been cleaned up.

我们看一下 ByteToMessageDecoder 是如何处理的

  public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
        if (msg instanceof ByteBuf) {
            selfFiredChannelRead = true;
            CodecOutputList out = CodecOutputList.newInstance();
            try {
                first = cumulation == null;
                cumulation = cumulator.cumulate(ctx.alloc(),
                        first ? Unpooled.EMPTY_BUFFER : cumulation, (ByteBuf) msg);
                callDecode(ctx, cumulation, out);
            } catch (DecoderException e) {
                throw e;
            } catch (Exception e) {
                throw new DecoderException(e);
            } finally {
                try {
                    if (cumulation != null && !cumulation.isReadable()) {
                        numReads = 0;
                        try {
                            cumulation.release();
                        } catch (IllegalReferenceCountException e) {
                            //noinspection ThrowFromFinallyBlock
                            throw new IllegalReferenceCountException(
                                    getClass().getSimpleName() + "#decode() might have released its input buffer, " +
                                            "or passed it down the pipeline without a retain() call, " +
                                            "which is not allowed.", e);
                        }
                        cumulation = null;
                    } else if (++numReads >= discardAfterReads) {
                        // We did enough reads already try to discard some bytes, so we not risk to see a OOME.
                        // See https://github.com/netty/netty/issues/4275
                        numReads = 0;
                        discardSomeReadBytes();
                    }

                    int size = out.size();
                    firedChannelRead |= out.insertSinceRecycled();
                    fireChannelRead(ctx, out, size);
                } finally {
                    out.recycle();
                }
            }
        } else {
            ctx.fireChannelRead(msg);
        }
    }

这里关键的点在于discardSomeReadBytes();在很多资料中介绍了discardSomeReadBytes()和discardReadBytes()的区别,这里我只简单说一下区别在于性能discardReadBytes对于连续的内存每次都要进行内存压缩而discardSomeReadBytes()处理是根据特定条件做内存压缩,连续的内存压缩需要重新移动数组所以在性能上是有区别的。

当我们使用Netty开发应用时它为我们提供了方便强大的底层支撑,但是我们要对Netty的api进行深入了解才不会在编写代码上出现问题。

Guess you like

Origin juejin.im/post/7238027797313765413