How to use Netty to receive 350,000 objects per second on a single machine

There are many demos on the Internet that use netty and protostuff to transmit rpc objects. Most of them are carved out of a mold. I also copied one at the beginning. The local test was unimpeded and no abnormality occurred.

After deploying the pre-release environment, after the stress test, there were a lot of problems and various error reports appeared endlessly. Of course, I used a large amount of data during the stress test and sent very intensive requests. A single machine sends 20,000 objects in the first 100ms per second, and rests for 900ms and sends in an endless loop. A total of 40 machines are used as clients, and 2 netty machines are sent at the same time. Server server sends objects, so on average, each server receives about 400,000 objects per second. Because of the business logic behind, the logic can only process 350,000 actual measurements per second.

The code on the Internet has been modified many times and tested repeatedly. In the end, it has achieved no errors and no exceptions. A single machine can receive more than 350,000 objects per second. Therefore, write an article to record. The code in the text will be consistent with the online logic.

Protostuff serialization and deserialization

This is nothing special, just find a tool online.

Introduce pom

<protostuff.version>1.7.2</protostuff.version>
<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-core</artifactId>
    <version>${protostuff.version}</version>
</dependency>

<dependency>
    <groupId>io.protostuff</groupId>
    <artifactId>protostuff-runtime</artifactId>
    <version>${protostuff.version}</version>
</dependency>
public class ProtostuffUtils {
    /**
     * 避免每次序列化都重新申请Buffer空间
     * 这句话在实际生产上没有意义,耗时减少的极小,但高并发下,如果还用这个buffer,会报异常说buffer还没清空,就又被使用了
     */
//    private static LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
    /**
     * 缓存Schema
     */
    private static Map<Class<?>, Schema<?>> schemaCache = new ConcurrentHashMap<>();
 
    /**
     * 序列化方法,把指定对象序列化成字节数组
     *
     * @param obj
     * @param <T>
     * @return
     */
    @SuppressWarnings("unchecked")
    public static <T> byte[] serialize(T obj) {
        Class<T> clazz = (Class<T>) obj.getClass();
        Schema<T> schema = getSchema(clazz);
        LinkedBuffer buffer = LinkedBuffer.allocate(LinkedBuffer.DEFAULT_BUFFER_SIZE);
        byte[] data;
        try {
            data = ProtobufIOUtil.toByteArray(obj, schema, buffer);
//            data = ProtostuffIOUtil.toByteArray(obj, schema, buffer);
        } finally {
            buffer.clear();
        }
 
        return data;
    }
 
    /**
     * 反序列化方法,将字节数组反序列化成指定Class类型
     *
     * @param data
     * @param clazz
     * @param <T>
     * @return
     */
    public static <T> T deserialize(byte[] data, Class<T> clazz) {
        Schema<T> schema = getSchema(clazz);
        T obj = schema.newMessage();
        ProtobufIOUtil.mergeFrom(data, obj, schema);
//        ProtostuffIOUtil.mergeFrom(data, obj, schema);
        return obj;
    }
 
    @SuppressWarnings("unchecked")
    private static <T> Schema<T> getSchema(Class<T> clazz) {
        Schema<T> schema = (Schema<T>) schemaCache.get(clazz);
        if (Objects.isNull(schema)) {
            //这个schema通过RuntimeSchema进行懒创建并缓存
            //所以可以一直调用RuntimeSchema.getSchema(),这个方法是线程安全的
            schema = RuntimeSchema.getSchema(clazz);
            if (Objects.nonNull(schema)) {
                schemaCache.put(clazz, schema);
            }
        }
 
        return schema;
    }
}

There is a pit here, that is, most of the online codes at the top use static buffers. There is no problem in the single-threaded case. In the case of multithreading, it is very easy to cause a buffer to be used by another thread again without being cleared after one use, and an exception will be thrown. The so-called avoidance of applying for buffer space every time, the measured performance impact is extremely small.

In addition, the two ProtostuffIOUtil changed to ProtobufIOUtil, because there was also an abnormality, and there was no abnormality after the modification.

Custom serialization method

Decoder decoder:

import com.jd.platform.hotkey.common.model.HotKeyMsg;
import com.jd.platform.hotkey.common.tool.ProtostuffUtils;
import io.netty.buffer.ByteBuf;
import io.netty.channel.ChannelHandlerContext;
import io.netty.handler.codec.ByteToMessageDecoder;
 
import java.util.List;
 
/**
 * @author wuweifeng
 * @version 1.0
 * @date 2020-07-29
 */
public class MsgDecoder extends ByteToMessageDecoder {
    @Override
    protected void decode(ChannelHandlerContext channelHandlerContext, ByteBuf in, List<Object> list) {
        try {
 
            byte[] body = new byte[in.readableBytes()];  //传输正常
            in.readBytes(body);
 
            list.add(ProtostuffUtils.deserialize(body, HotKeyMsg.class));
 
//            if (in.readableBytes() < 4) {
//                return;
//            }
//            in.markReaderIndex();
//            int dataLength = in.readInt();
//            if (dataLength < 0) {
//                channelHandlerContext.close();
//            }
//            if (in.readableBytes() < dataLength) {
//                in.resetReaderIndex();
//                return;
//            }
//
//            byte[] data = new byte[dataLength];
//            in.readBytes(data);
//
//            Object obj = ProtostuffUtils.deserialize(data, HotKeyMsg.class);
//            list.add(obj);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Encoder

import com.jd.platform.hotkey.common.model.HotKeyMsg;
import com.jd.platform.hotkey.common.tool.Constant;
import com.jd.platform.hotkey.common.tool.ProtostuffUtils;
import io.netty.buffer.ByteBuf;
import io.netty.channel.ChannelHandlerContext;
import io.netty.handler.codec.MessageToByteEncoder;
 
/**
 * @author wuweifeng
 * @version 1.0
 * @date 2020-07-30
 */
public class MsgEncoder extends MessageToByteEncoder {
 
    @Override
    public void encode(ChannelHandlerContext ctx, Object in, ByteBuf out) {
        if (in instanceof HotKeyMsg) {
            byte[] bytes = ProtostuffUtils.serialize(in);
            byte[] delimiter = Constant.DELIMITER.getBytes();
 
            byte[] total = new byte[bytes.length + delimiter.length];
            System.arraycopy(bytes, 0, total, 0, bytes.length);
            System.arraycopy(delimiter, 0, total, bytes.length, delimiter.length);
 
            out.writeBytes(total);
        }
    }
}

First look at the Decoder decoder, this is used to decode the message after netty receives the message, and convert the byte to an object (custom HotKeyMsg). There are a bunch of posts that I have commented out, commented out, and posts that should be found on the Internet are written like that. This method itself is no problem in ordinary scenes, and the decoding is still normal, but it is very prone to sticking problems when there are hundreds of thousands. So I added a DelimiterBasedFrameDecoder delimiter decoder before this decoder.

When a message is received, the delimiter decoder is passed first, and then when it comes to MsgDecoder, it is an object byte stream that has been separated, and it can be deserialized directly with the proto tool class. Constant.DELIMITER is a special string that I customize, used as a separator.

Look at encoder, encoder, first serialize the object to be transmitted into byte[] with ProtostuffUtils, and then hang my custom separator on the tail. In this way, when the object is sent to the outside, the encoder will be used and the separator will be added.

The corresponding server-side code is roughly like this:

After that, the transferred object can be used directly in the Handler.

Look at the client side again

It is the same as the server side, but also these codecs, there is no difference. Because of the communication between netty and server, I use the same object definition.

The same is true for handlers.

Single machine and cluster

After all the above is written, we can actually test it. We can start a client, a server, and then make an endless loop to send this object to the server, and then you can directly send the object to the server after receiving the object Write it back and send it to the client as is. You will find that it runs smoothly, sending N million per second is no problem, the codec is normal, and the client and server sides are relatively normal. The current premise is that the tool class of ProtoBuf is the same as mine. Don't share the buffer. The articles found on the Internet are basically over at this point, and it is OK to send a few messages at random. However, in fact, after this kind of code goes online, it will be pitted.

In fact, local testing is also very easy. Start a few clients, all together with a Server, and then send them objects in an endless loop, and then see if there are exceptions at both ends. In this case, the difference from the first one is actually the same on the client side, but on the server side. Previously, only one client was sent a message at the same time. Now it is sent to two clients at the same time. This step will happen if you are not careful. There is a problem, it is recommended to try it yourself.

After that, let’s add something more. I started two servers with two ports respectively. There are actually two different server servers online. The client will send objects to the two servers in an endless loop at the same time, as shown in the code below.

For sending messages, we usually use channel.writeAndFlush(). You can remove the sync and run the code to see. You will find abnormally thrown lumps. We obviously sent messages to two different channels, but the time was at the same time, and as a result, serious sticky packets occurred. Many messages received by the server are irregular, and a large number of errors will be reported. If the interval between the two channels is 100ms, the situation is solved. Of course, in the end we can use sync to send synchronously, so that no exception will be thrown.

The above code has been tested. With 40 clients and 2 servers, each server receives about 400,000 objects per second on average, which can run continuously and stably.

Guess you like

Origin blog.csdn.net/qq_46388795/article/details/108664404