How Netty source code analysis series of TCP stick package, half a pack of solving problems, and Netty

Fanger Wei code or search public micro-channel number of the next scan 菜鸟飞呀飞, you can focus on micro-channel public number, read more Spring源码分析, and Java并发编程articles.

Micro-channel public number

problem

Netty to analyze how the server is accessing a new connection, then when a new connection access, you can begin to read and write data operations in the previous article. During data read and write operations, for the purposes of the TCP connection, netty needs to be addressed in TCP stick package, half a pack of questions, which will be the focus of the analysis of the content of this article today. Before you start reading this article, you can consider the following two questions.

    1. What is TCP-stick package, half a pack a problem? The presence of half a pack stick package UDP protocol?
    1. How netty is resolved

What is the stick package unpacking

In the TCP / IP model, TCP and UDP protocol is a transport layer protocol, a great difference between the two protocols in the presence of a data transmission process.

For UDP protocol, which transmits the data transmission and reception is carried out based on the data packet, the UDP protocol header, there will be a 16bit field indicates the length of the UDP data packets, the application layer in the well will be different separate data message area. It can be understood as the data transmission protocol UDP is a boundary, so it will not stick package exists, the problem of half a pack.

While for the TCP protocol, which is based on the transmission data byte streaming. Problems in the application layer data transmission, in fact, will first write data to the TCP socket buffer when the buffer is full, the data will be written out, which may cause stick package, half a pack of . And when the recipient receives the data actually received is a stream of bytes, the so-called flow, can be understood as the same river. Since it is no flow between the plurality of data packets each boundary, and the TCP protocol header, no separate field to represent a packet length, an application layer so that the recipient reads from the byte stream after taking the data, there is no way to separate the two regions of the data packets.

Stick package, semi-schematic package

When the sender sends two consecutive complete data packet to the receiving side, if the TCP protocol for transmission, the following situations may exist. And the lower figure packet1 packet2 represent two complete data packet sent by the sender.

The first case, there is no occurrence of stick package, half a pack phenomenon, i.e., the receiver normally receives two independent full packet packet1, packet2, this situation is normal. As shown in Figure 1.

Normal package

The second case occurred stick pack phenomenon, that is, the sender writes data packet packet1 to own TCP socket buffer after, TCP does not immediately send data out, because the buffer might not slow . Then the sender has sent a packet packet2, still the first to write a TCP socket buffer, then the buffer is full, then TCP will send it out with a buffer of data, this time the recipient received the data appears to only one data packet. In the TCP protocol header, without a separate field to indicate the length of the data packet, so that the receiver can not distinguish packet1 simply and packet2, which is called the problem of stick package. Further, when the TCP layer of the receiver receives the data, the application layer because there is no time to read data from the TCP socket, it can also cause the phenomenon stick package. as shown in picture 2.

Stick package

The third case, the phenomenon occurs half a pack, i.e., the sender still has two data packets transmitted and packet1 packet2, but in TCP transmission, the transmission of several minutes, the contents of each transmission is not included and packet1 packet2 the full package, or only part packet1 packet2, the content is equivalent to two data packets split, and therefore the phenomenon called unpacking. As shown in Figure 3.

Half a pack

Cause stick package, half a pack

From the above diagram, we can know substantially stick package produce, mainly as half a pack.

  • The reason stick package
      1. Each write data sender is smaller than the size of the socket buffer;
      1. Reading recipient data socket buffer is not enough time.
  • Half a pack reasons
      1. Writing data sender is greater than the size of the socket buffer;
      1. It is greater than the data transmission protocol of the MSS or MTU, to be unpacked. (MSS is the maximum segment size of the TCP layer, TCP layer sends data to the IP layer can not exceed this value; the MTU is the maximum transmission unit is a physical layer provides an upper layer to a maximum transmitting data size, the data used to limit the IP layer transfer size).

But ultimately, produce stick package, the package is the root cause of the half because TCP is a byte-oriented data transmission, no boundaries between the packets to each other, resulting in the receiver can not accurately distinguish each individual data packet.

How to solve the stick package unpacking problem netty

As a developer of the application layer, we are unable to change based on characteristics of the TCP byte stream transmission of data, unless we have a custom TCP protocol similar to, but much more difficult, not necessarily designed performance than existing TCP protocol performance, besides currently use the TCP protocol is widely used. The netty as a high-performance network framework, inevitably there should be support for the TCP protocol, since TCP protocol support, it would have to solve the stick package TCP, half a pack of problems, or if the developers to solve their own, then time-consuming and laborious.

netty the TCP to address the stick package, half a pack problems by providing a series of codecs, as the name implies, the codec is through TCP bytes read from the socket stream through certain rules, which encodes or decodes, or encoded into a binary byte stream parsing a complete packet. Netty provided in many common codec, for the decoder, they inherit from the abstract class ByteToMessageDecoder ; for an encoder, which are abstract classes inherit MessageToByteEncoder .

The main analysis briefly at today abstract class decoder ByteToMessageDecoder source category, for the specific decoder implementation will be analyzed in detail the principle behind the two articles, for encoders, the encoding process and the decoding process is just the opposite, and therefore not then repeat them alone, friends who are interested can read on their own, please share.

ByteToMessageDecoder actually a ChannelHandler, its implementation class needs to be added to the pipeline will work. When added to the pipeline in the decoder, when there OP_READ event, the handler will be executed by the pipeline of all propagation channelRead () method. In the abstract class decoder, the definition channelRead () method. Its source code as follows.

@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
    if (msg instanceof ByteBuf) {
        // 用来存放解码出来的数据对象,可以将它当做一个集合
        CodecOutputList out = CodecOutputList.newInstance();
        try {
            ByteBuf data = (ByteBuf) msg;
            first = cumulation == null;
            if (first) {
                //第一次直接赋值
                cumulation = data;
            } else {
                //累加数据
                cumulation = cumulator.cumulate(ctx.alloc(), cumulation, data);
            }
            // 调用解码的方法
            callDecode(ctx, cumulation, out);
        } catch (DecoderException e) {
            throw e;
        } catch (Exception e) {
            throw new DecoderException(e);
        } finally {
            // 省略部分代码...

            // size是解码出来的数据对象的数量,
            int size = out.size();
            decodeWasNull = !out.insertSinceRecycled();
            // 向下传播,如果size为0,就表示没有解码出一个对象,因此不会向下传播,而是等到下一次继续读到数据后解码
            fireChannelRead(ctx, out, size);
            out.recycle();
        }
    } else {
        ctx.fireChannelRead(msg);
    }
}
复制代码

Netty byte stream data in the core logic of the decoder is to read the accumulator by a cumulative, and calling an implementation of the decoder to decode the accumulated data, if a data object is decoded, it means read a complete packet, then the decoded data objects propagate down the Pipeline, referred to service code to execute business logic behind. If not able to decode a data object, it indicates that no read a complete packet, not spread down, but a continued waiting for data reading, data continues to accumulate the byte stream until the accumulation of data can decode a data object, and then travels downward.

There are three important points: Firstly, the byte stream data accumulator; second: specific decoding operation, this step is done in a particular decoder implementation class; Third: decoded data objects , in particular after the decoder decodes the data of the byte stream data object, the object will be stored in a list , and then the data object propagate down the pipeline. Next we combine the above source code to analyze these next steps.

First determines msg whether ByteBuf object type, the byte stream for the data that has been decoded at this time will not be ByteBuf type, and therefore do not need to be decoded, else logic to enter directly the data object travels downward . For the byte stream data is not decoded at this time it is ByteBuf msg type, thus if the logic proceeds to block decoded.

If the logic, defines a first out the object, the object can simply put it as a collection of objects, which is used to store the successfully decoded data object, i.e. the above-mentioned third point List . Next will encounter cumulation this object, it is a byte stream data accumulator, the default value MERGE_CUMULATOR , by determining whether it is empty, so it knows whether the data is first read, if it is empty, showing no accumulation front data, thus directly msg equal cumulation , meaning that the current byte stream data accumulated in the accumulator cumulation all msg; if the accumulator is not empty, the data indicating the presence of the front portion of the accumulator (the front half of the TCP packet phenomenon occurred ), it is necessary to read the byte stream data currently accumulated in the accumulator msg.

How to byte stream data accumulated in the accumulator it? That is invoked accumulator cumulate () method, here is strategy design mode, the default accumulator is MERGE_CUMULATOR , its source code is as follows.

public ByteBuf cumulate(ByteBufAllocator alloc, ByteBuf cumulation, ByteBuf in) {
    try {
        final ByteBuf buffer;
        // 如果因为空间满了写不了本次的新数据 就扩容
        // cumulation.writerIndex() > cumulation.maxCapacity() - in.readableBytes() 可以装换为如下:
        // cumulation.writeIndex() + in.readableBytes()>cumulation.maxCapacity
        // 即 写指针的位置+可读的数据的长度,如果超过了ByteBuf的最大长度,那么就需要扩容
        if (cumulation.writerIndex() > cumulation.maxCapacity() - in.readableBytes()
                || cumulation.refCnt() > 1 || cumulation.isReadOnly()) {
            // 扩容
            buffer = expandCumulation(alloc, cumulation, in.readableBytes());
        } else {
            buffer = cumulation;
        }
        // 将新数据写入
        buffer.writeBytes(in);
        return buffer;
    } finally {
        // 释放内存,防止OOM
        in.release();
    }
}
复制代码

Can be seen in the code, before the data accumulated in the accumulator, first determines whether the capacity is needed, if the capacity is needed, call expandCumulation () method for the first expansion. Last call writeBytes () method writes data into the accumulator, and the accumulator return. Relates to a method of expansion, due to the accumulator here is MERGE_CUMULATOR , and therefore the underlying memory copy is performed. In netty also provides another type of accumulator: COMPOSITE_CUMULATOR , it does not require expansion memory copy time, but by a combination of ByteBuf, i.e. CompositeByteBuf class to achieve expansion.

So the question is, apparently based memory copy operation will be more slowly, and that netty Why accumulator memory-based replication by default it? netty source to the inside explained as follows:

/**
 * Cumulate {@link ByteBuf}s by add them to a {@link CompositeByteBuf} and so do no memory copy whenever possible.
 * Be aware that {@link CompositeByteBuf} use a more complex indexing implementation so depending on your use-case
 * and the decoder implementation this may be slower then just use the {@link #MERGE_CUMULATOR}.
 */
复制代码

Generally means that: the accumulator is only the accumulated data, the specific decoding operation is an abstract class that implements the decoder to do, for the implementation class of the decoder, at this time we do not know the particular decoder implementation class how the decoding , it may be based on CompositeByteBuf type of data structure, decode it will be slower, not necessarily faster than using ByteBuf, efficiency is not necessarily high, so netty default on the direct use of MERGE_CUMULATOR , which is accumulator-based memory copy.

When the accumulated data to the accumulator, is called callDecode (ctx, cumulation, out) to the decoding. Its streamlined source as follows.

protected void callDecode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) {
    try {
        // 只要有数据可读,就循环读取数据
        while (in.isReadable()) {
            int outSize = out.size();
            // 如果out中有对象,这说明已经解码出一个数据对象了,可以向下传播了
            if (outSize > 0) {
                // 向下传播,并清空out
                fireChannelRead(ctx, out, outSize);
                out.clear();
                if (ctx.isRemoved()) {
                    break;
                }
                outSize = 0;
            }
            // 记录下解码之前的可读字节数
            int oldInputLength = in.readableBytes();
            // 调用解码的方法
            decodeRemovalReentryProtection(ctx, in, out);
            if (ctx.isRemoved()) {
                break;
            }
            // 如果解码前后,out中对象的数量没变,这表明没有解码出新的对象
            if (outSize == out.size()) {
                // 当没解码出新的对象时,累计器中可读的字节数在解码前后也没变,说明本次while循环读到的数据,
                // 不够解码出一个对象,因此中断循环,等待下一次读到数据
                if (oldInputLength == in.readableBytes()) {
                    break;
                } else {
                    continue;
                }
            }
            // out中的对象数量变了,说明解码除了新的对象,但是解码前后,累计器中的可读数据并没有变化,这表示出现了异常
            if (oldInputLength == in.readableBytes()) {
                throw new DecoderException(
                        StringUtil.simpleClassName(getClass()) +
                                ".decode() did not read anything but decoded a message.");
            }

            if (isSingleDecode()) {
                break;
            }
        }
    } catch (DecoderException e) {
        throw e;
    } catch (Exception cause) {
        throw new DecoderException(cause);
    }
}
复制代码

The second parameter of the method in that the accumulator we mentioned earlier, the third parameter OUT , is stored in the aforementioned data object successfully decoded. This method may refer to logic comments in the code above, the core code is decoded in the line:

// 调用解码的方法
decodeRemovalReentryProtection(ctx, in, out);
复制代码

The source code for this method. In its code, it will be true to call the decode () method subclass decoder, decodes data.

final void decodeRemovalReentryProtection(ChannelHandlerContext ctx, ByteBuf in, List<Object> out)
        throws Exception {
    decodeState = STATE_CALLING_CHILD_DECODE;
    try {
        //调用子类的解码方法
        decode(ctx, in, out);
    } finally {
        boolean removePending = decodeState == STATE_HANDLER_REMOVED_PENDING;
        decodeState = STATE_INIT;
        if (removePending) {
            handlerRemoved(ctx);
        }
    }
}
复制代码

decode (ctx, in, out) method is an abstract method, which to achieve a specific logic implementation class by a decoder. Obviously, in the parent class, call an abstract method abstract methods specific logic implemented by subclasses themselves, used here is the template method design pattern. netty decoder commonly used are the following, as shown below. About subclass core logic decoding, followed by two articles analysis.

FixedLengthFrameDecoder(基于固定长度的解码器)
LineBasedFrameDecoder (基于行分隔符的解码器)
DelimiterBasedFrameDecoder (基于自定义分割符的解码器)
LengthFieldBasedFrameDecoder (基于长度字段的解码器)
复制代码

And finally back channelRead () the finally statement block method (code below), will first acquire out the number of decoded data object, and then calls fireChannelRead () method of the parsed data objects downward propagation process. If the size is 0, it means that an object is not decoded, it will not propagate down, but continues to read the data until the next decoding.

finally {
    // 省略部分代码...

    // size是解码出来的数据对象的数量,
    int size = out.size();
    decodeWasNull = !out.insertSinceRecycled();
    // 向下传播,如果size为0,就表示没有解码出一个对象,因此不会向下传播,而是等到下一次继续读到数据后解码
    fireChannelRead(ctx, out, size);
    out.recycle();
}
复制代码

to sum up

This article first describes what the TCP stick package, half a pack phenomenon, is to separate the plurality of data packet synthesis or split into a plurality of data packets, and generating a stick package, the root causes of the phenomenon of half-package is based on the TCP protocol byte streaming data. Then combined source introduced, half a pack problem to solve is how to netty stick package by using the codec. Finally, with regard to the specific source decoding operations analysis, will be analyzed later in two articles.

recommend

Micro-channel public number

Guess you like

Origin juejin.im/post/5e088c97f265da33ea00aae9