Series Analysis Netty common source decoder (on)

Fanger Wei code or search public micro-channel number of the next scan 菜鸟飞呀飞, you can focus on micro-channel public number, read more Spring源码分析, and Java并发编程articles.

Micro-channel public number

Foreword

In the previous article, analyzes how netty only be resolved through a TCP codec stick package, half a pack of problems, there is no detailed analysis of how the decoder to decode the data, this paper will analyze these specific decoders today working principle.

netty provides us with a few very common decoder, these decoders can meet almost all our scenes, these decoder according to the degree of difficulty, from top to bottom, as shown in the following table.

FixedLengthFrameDecoder(基于固定长度的解码器)
LineBasedFrameDecoder (基于行分隔符的解码器)
DelimiterBasedFrameDecoder (基于自定义分割符的解码器)
LengthFieldBasedFrameDecoder (基于长度字段的解码器)
复制代码

The first three decoder is relatively simple, easy to understand, the last one decoder is relatively complex and not easy to understand, but it is the most satisfying scenes of. Since the first three decoder is relatively simple, and thus their source in an article in the analysis, which is the main content of this article today. Finally, a decoder source code will be analyzed separately later article.

FixedLengthFrameDecoder

According to the class name, you will be able to know that this translation is based on a fixed length decoder, what does that mean? In this initialization is a decoder, a specified number of type int: frameLength , back at the time of decoding, each time read frameLength a byte length decoding on a data object. For example: When the sender sends the data four times, respectively, A, BC, DEFG, HI, a total of 9 bytes, if we specify a fixed length decoder frameLength =. 3 , it means that a solution of 3 bytes per code , then the result is decoded results: ABC , DEF , GHI .

+---+----+------+----+          +-----+-----+-----+
| A | BC | DEFG | HI |   ->    | ABC | DEF | GHI |
+---+----+------+----+          +-----+-----+-----+
复制代码

Fixed length decoder-based source code and comments are as follows, comparing the short answer, is not done to analyze, and the reference to the comment in the source code.

public class FixedLengthFrameDecoder extends ByteToMessageDecoder {

    // 表示每次解码多长的数据
    private final int frameLength;

    public FixedLengthFrameDecoder(int frameLength) {
        checkPositive(frameLength, "frameLength");
        // 指定每次解码的字节数
        this.frameLength = frameLength;
    }

    @Override
    protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
        // 解码
        Object decoded = decode(ctx, in);
        // 如果解码出来的数据对象不为空,就将其保存到out这个集合中
        if (decoded != null) {
            out.add(decoded);
        }
    }

    protected Object decode(
            @SuppressWarnings("UnusedParameters") ChannelHandlerContext ctx, ByteBuf in) throws Exception {
        // 如果可读取的数据小于每次解码的长度,那就直接返回null
        if (in.readableBytes() < frameLength) {
            return null;
        } else {
            // 读取指定长度的字节数据,然后返回
            return in.readRetainedSlice(frameLength);
        }
    }
}
复制代码

LineBaseFrameDecoder

LineBaseFrameDecoder based row decoder delimiter. What does that mean? Is read whenever the line separator (\ n or \ r \ n) when it parses a data object. Such as the example below.

+---+-------+----------+------+          +-----+-----+-------+
| A | B\nC | DE\r\nFG | HI\n |   ->      | AB  | CDE | FGHI |
+---+-------+----------+------+          +-----+-----+-------+
复制代码

By the example above, LineBaseFrameDecoder principles seem simple, but in fact, did not behave like the above so simple in implementation. In LineBaseFrameDecoder defines the member variables several very important. As follows.

// 解码的最大长度
private final int maxLength;

// 当通过换行符读出来的数据超过maxLength规定的长度后,是否立即抛出异常。true表示立即
private final boolean failFast;

// 解析数据时是否跳过换行符\r\n或者\n,true表示跳过,false表示不跳过
private final boolean stripDelimiter;

// 当超过maxLength的长度后,就不能解码,需要丢弃数据,此时会将discarding设置为true,表示丢弃数据
private boolean discarding;

// 记录已经丢弃了对少字节的数据
private int discardedBytes;

// 最后一次扫描的位置
private int offset;
复制代码

When data is decoded, first find will newline position, is then calculated from the current position of the read pointer to the length of the line break position, if the length is greater than maxLength , it means that it is a invalid data can not be decoded, it is necessary to discard . For example, we set maxLength = 4 , the example shown in the following figure, only two decoded correct data packet: AB , the CDEF , and for GHIJBCA , because its length is 6, over the maxLength predetermined length, therefore discarded.

Exemplary row decoder

When the decoded data, the decoded data is for retention after newline \ r \ n or \ n-, by stripDelimiter attribute control, true means skip line break, the decoded data is not retained newline. In addition, when more than newline read out by the data length maxLength after, you need to discard the data, then the data is discarded when it? Immediately discard the data? Or wait until the next data is discarded when reading it? This may be failFast to control the properties, true represents discarded immediately. Meanwhile, if you need to discard the data, it will be discarding property that is true.

Newline below in connection with a decoder look-source decoding process. Newline decoder Decoder abstract class inherits the previous article mentioned ByteToMessageDecoder , rewrite the abstract method decode () .

protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
    // 解码
    Object decoded = decode(ctx, in);
    // 能解码出来数据,就将解码出来的结果存放到out中
    if (decoded != null) {
        out.add(decoded);
    }
}
复制代码

It can be seen in another overloaded core logic decode () method. Source of this overloaded method is very long, it will order a bit, the overall framework as follows.

protected Object decode(ChannelHandlerContext ctx, ByteBuf buffer) throws Exception {
    // 返回\n或者\r\n的下标位置
    final int eol = findEndOfLine(buffer);
    if (!discarding) {
        // 找到换行符
        if (eol >= 0) {
            // 解码...
            return frame;
        } else {
            // ...
            return null;
        }
    } else {
        // 找到了换行符
        if (eol >= 0) {
            // ...
        } else {
            // ...
        }
        return null;
    }
}
复制代码

First, look for a \ n or \ r \ n subscript position, the search process is relatively simple, is to traverse the array of bytes. If a line feed is not found, it will return a value less than 0, the line breaks if found, it will return a number greater than or equal to 0. Wherein if the \ n, then returns \ n index value; if found is \ r \ n, then returns \ r index values.

Then the rest of the logic, can be divided into two parts: the is in drop mode . The first part is not in the logic performed in discarding mode, namely: discarding to false = !, Then discarding it as true; the second portion is performed in the drop mode logic. For two parts, each part can be divided into two cases: to find a line break and did not find line breaks , so here is actually four kinds of logic. When the first call decoder decoding method, in which case Discarding = to false , i.e., in the non-drop mode.

The first case: non-drop mode and found newline (EOL> = 0) , this case corresponds to the specific code is executed as follows. Calculates the first read pointer to the data length of the length between the line break, and then determines whether or not the data length of this maxLength exceeds the limit, and if exceeded, indicates that the data is illegal, it is necessary to discard this data. How to discard it? After the read pointer is moved to the ByteBuf line breaks, then calls fail () method, exception handling. If the data does not exceed the maximum limit, it means that the data is legitimate and can be properly decoded. Next, at the time of decoding, based stripDelimiter to determine whether to retain the line breaks, the last decoded data assigned to Frame, and then returns. This situation is the ideal situation.

// 非丢弃模式且找到换行符
if (eol >= 0) {
    final ByteBuf frame;
    // 计算出要截取的数据长度
    final int length = eol - buffer.readerIndex();
    // 判断是\r\n还是\n,如果是\r\n返回2,如果是\n返回1
    final int delimLength = buffer.getByte(eol) == '\r'? 2 : 1;

    // 如果读取的数据长度,超过最大长度,那么就不能读这一段数据,需要跳过这段数据,即将读指针直接\n之后。
    if (length > maxLength) {
        // 跳过这段数据
        buffer.readerIndex(eol + delimLength);
        // 进入失败模式
        fail(ctx, length);
        return null;
    }

    // 是否跳过换行符
    if (stripDelimiter) {
        // 读数据时不读取换行符
        frame = buffer.readRetainedSlice(length);
        // 跳过换行符
        buffer.skipBytes(delimLength);
    } else {
        // 读取到的数据包含换行符
        frame = buffer.readRetainedSlice(length + delimLength);
    }
    return frame;
}
复制代码

The second case, the non-drop mode but not found newline (EOL <0) , this case corresponds to the specific code executed as follows. Because the newline is not found, it is certainly not out of the decoded data. However, because we have a maxLength restriction, so in this case need to determine what the current buffer readable data exceeds this maximum limit. If it does, it certainly is not illegal data, so this data all need to be discarded, when it dropped? Now it is lost or discarded immediately next to decode the data? Depending on the value of a member variable failFast, true represents discarded immediately, false is represented by the next time the discarded decoded.

else {
    // 没有找到换行符,则判断可读的数据长度是否超过最大长度,如果超过,则需要丢弃数据
    final int length = buffer.readableBytes();
    if (length > maxLength) {
        // 设置丢弃的长度为本次buffer的可读取长度
        discardedBytes = length;
        // 修改读指针,跳过这段数据
        buffer.readerIndex(buffer.writerIndex());
        // 设置为丢弃模式
        discarding = true;
        offset = 0;
        // 是否快速进入丢弃模式
        if (failFast) {
            fail(ctx, "over " + discardedBytes);
        }
    }
    return null;
}
复制代码

The third case, dropping pattern and found a line feed (EOL> = 0) , the corresponding code is as follows. Wherein the result of the decoding cycle is called the parent class will subclass decoding method decode (), so the current data to be discarded when the surface, will enter discard mode. Although at this time to find a line feed, since the previous data that need to be discarded, so in this case, before a new line of data will discard all found in this (including the first cycle of data need to be discarded), and finally the discarding mode is set to false, because the data has been discarded after the next read cycle, when the determination is normally decoded.

// 丢弃模式且找到了换行符
if (eol >= 0) {
    // 以前丢弃的数据长度+本次可读的数据长度
    final int length = discardedBytes + eol - buffer.readerIndex();
    // 拿到分隔符的长度
    final int delimLength = buffer.getByte(eol) == '\r' ? 2 : 1;
    // 跳过丢弃的数据
    buffer.readerIndex(eol + delimLength);
    // 设置丢弃数据长度为0
    discardedBytes = 0;
    // 设置非丢弃模式
    discarding = false;
    if (!failFast) {
        fail(ctx, length);
    }
}
复制代码

Fourth, the dropping pattern and line break is not found (EOL <0) , the corresponding code is as follows. At this time, because there is no newline is found, it certainly can not be decoded correctly, and also is in drop mode, so this read data are all invalid and need to be discarded, but in this section of the code, we found, and no data is discarded immediately, why? Because even discard fell to a reading of the first half of the data, if the data will be discarded, then the next read data, the data length may find less than the length specified maxLength, so we'll take it to decode, in fact, this data is not available.

else {
    // 没找到换行符
    // 以前丢弃的数据 + 本次所有可读的数据
    discardedBytes += buffer.readableBytes();
    // 跳过本次所有可读的数据
    buffer.readerIndex(buffer.writerIndex());
    // 我们跳过缓冲区中的所有内容,需要再次将偏移量设置为0。
    offset = 0;
}
复制代码

Whether it is the third case or the fourth case, in the drop mode, can not be decoded properly, the final return is null, i.e., no decoding object.

From the foregoing analysis, it can be seen when the data to be discarded, calls fail () method, which method has several overloaded, but the call will eventually follows overloaded.

private void fail(final ChannelHandlerContext ctx, String length) {
    ctx.fireExceptionCaught(
            new TooLongFrameException(
                    "frame length (" + length + ") exceeds the allowed maximum (" + maxLength + ')'));
}
复制代码

In fact, is to create an exception, the exception information in practical applications we may often see, then spread this anomaly down by pipeline, eventually calls to the handler exceptionCaught () method.

Overall speaking, LineBaseFrameDecoder the decoder based on line breaks, easy to implement the idea is to split the data according to \ r \ n or \ n-, if divided by the number of data is greater than a predetermined maximum length, then the discarded data , or that the decoding was successful.

DelimiterBasedFrameDecoder

DelimiterBasedFrameDecoder is a decoder based separator, which is to be decoded according to the user's own specified delimiter, the delimiter if the user is defined as a semicolon (;) , it means to decode the data according to a semicolon. Further, the user can specify the plurality of delimiter, as long as the data is read, hit either one of the separator, it can be decoded once. For example, shown below, was designated delimiter comma, exclamation point, semicolon, line breaks , then the result of decoding: AB , the CDEF , Aghi , the BCA .

Decoder based separator Example

Further, if and only if, only the two specified delimiters, and is \ r \ n and \ n, the decoder-based separator becomes a line-based decoder. Then at the time of decoding, the direct use of the line splitter LineBaseFrameDecoder decoding.

In DelimiterBasedFrameDecoder defines the properties of several very important properties of these meanings and uses of the LineBaseFrameDecoder member variable decoder defined in almost the same. Here are the meaning and effect of these member variables.

// 分隔符数组,因为可以同时制定过个分隔符,所以使用数组来存放
private final ByteBuf[] delimiters;

// 最大长度限制
private final int maxFrameLength;

// 是否跳过分隔符,true表示跳过
private final boolean stripDelimiter;

// 是否立即丢弃,true:立即
private final boolean failFast;

// 是否处于丢弃模式
private boolean discardingTooLongFrame;

// 累计丢弃的字节数
private int tooLongFrameLength;

/** Set only when decoding with "\n" and "\r\n" as the delimiter.  */
// 如果分隔符是\r\n和\n时,就直接使用基于行的解码器
private final LineBasedFrameDecoder lineBasedDecoder;
复制代码

It can be seen with LineBaseFrameDecoder decoder except that the separator has two more members of the decoder variables, one delimiters attribute, which is an array used to store user-defined delimiters, since a plurality of users can customize separators, the use of an array to store. Another is lineBasedDecoder attribute, represented by the row decoder is based, a required delimiter and the array delimiter iff is \ r \ n and \ n, the separator becomes a decoder row decoder, the value of the property in LineBaseFrameDecoder initialized constructor. Source follows.

public DelimiterBasedFrameDecoder(
        int maxFrameLength, boolean stripDelimiter, boolean failFast, ByteBuf... delimiters) {

    // 省略其他代码...
    if (isLineBased(delimiters) && !isSubclass()) {
        lineBasedDecoder = new LineBasedFrameDecoder(maxFrameLength, stripDelimiter, failFast);
        this.delimiters = null;
    } else {
        // 省略其他代码...
        lineBasedDecoder = null;
    }
    // 省略其他代码...
}
复制代码

In the configuration process, we will pass isLineBased (delimiters) method is judged delimiter whether \ r \ n and \ n-, if so, create a row decoder, and then assigned to lineBasedDecoder attribute; otherwise make lineBasedDecoder property is empty. isLineBased () source method is as follows.

private static boolean isLineBased(final ByteBuf[] delimiters) {
    // 当分隔符数组中只包含\r\n和\n时,才返回true
    if (delimiters.length != 2) {
        return false;
    }

    ByteBuf a = delimiters[0];
    ByteBuf b = delimiters[1];
    // 保证令a = \r\n,令b= \n
    if (a.capacity() < b.capacity()) {
        a = delimiters[1];
        b = delimiters[0];
    }
    return a.capacity() == 2 && b.capacity() == 1
            && a.getByte(0) == '\r' && a.getByte(1) == '\n'
            && b.getByte(0) == '\n';
}
复制代码

From isLineBased () it can be seen source method, if and only if the delimiter is \ r \ n and \ time n, it returns true, that is, based on the separator into the decoder row decoder.

Similarly, a decoder to rewrite a separator decode abstract methods of the parent class ().

protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
    // 调用decode()的重载方法解码
    Object decoded = decode(ctx, in);
    // 如果能成功解码,就添加到out中
    if (decoded != null) {
        out.add(decoded);
    }
}
复制代码

In the overloaded core logic decode (ctx, in) method. The same source of the process is very long, I was streamlined, for readability, code changes were slight, substantially follows the skeleton.

protected Object decode(ChannelHandlerContext ctx, ByteBuf buffer) throws Exception {
    // 首先判断基于行的解码器是否被初始化了,如果被初始化了,就表明分隔符就是\r\n和\n,直接只用行解码器进行解码
    if (lineBasedDecoder != null) {
        return lineBasedDecoder.decode(ctx, buffer);
    }
    int minFrameLength = Integer.MAX_VALUE;
    ByteBuf minDelim = null;
    // 遍历所有的分隔符,然后找到最小位置的分割符
    for (ByteBuf delim: delimiters) {
        // 找到最小分隔符的位置
    }
    // 如果找到了分割符
    if (minDelim != null) {

        //如果处于丢弃模式
        if (discardingTooLongFrame) {

            return null;
        }else{
            return frame;
        }
    } else {
        //如果没有找到分割符
        //判断是否处于非丢弃模式
        if (!discardingTooLongFrame) {

        } else {

        }
        return null;
    }
}
复制代码

First determines lineBasedDecoder is empty, if not empty, it means that the separator is \ r \ n and \ n-, then the row decoder decodes directly; otherwise, proceed to logic behind.

Unlike the previous analysis of the row decoder is split symbol decoder, since a plurality of delimiter can be specified, we first need to find readable data, appears first in which a delimiter, and the emergence of in what position, how to find it? Through each separator, and then they were found in the index-readable data, a delimiter of the last lo smallest index , that is, who was the first occurrence of the delimiter.

The logic behind the row decoder and almost the same, but also the first two cases: to find a separator , not looking to play a separator . Then, for each of the front case is subdivided is in drop mode, so that a total of four cases. Row decoder except that the row decoder first determines whether the mode is discarded, and then determines whether or not a delimiter found, roadmap consistent. Still only found a separator, and the decoder does not discard mode in order to decode the data, otherwise it will return null. Specific details on the inside, the description will not start with the previous analysis of the same row decoder.

to sum up

Subsequently the article article is analyzed in the article mentioned in the three common decoder: fixed length decoder, a row decoder, a decoder based on the separator, three relatively simple decoder, binding articles pictures and examples, it is easy to understand, and in fact, we usually use in development str.spilt () principle of the method is similar, except that the need to discard mode judge. The next article will analyze the decoder based on the length of the field , the decoder analysis today little more than a three decoder will be a little complicated, but it is the most versatile decoder.

recommend

Micro-channel public number

Guess you like

Origin juejin.im/post/5e0a16e4e51d4575d434e363