10 - Communication Protocol for Network Communication Optimization: How to optimize RPC network communication?

SpringCloud and Dubbo are the most widely used microservice frameworks, and there has been a comparison between the two in the industry. Many technicians will argue which of the two frameworks is better.

I remember that when our department built the microservice framework, it also struggled with technology selection for a long time, and there was once a heated discussion. At present, SpringCloud is very popular, with a complete microservice ecosystem, and has been voted by many colleagues, but our final choice is Dubbo. Why?

1. RPC communication is the core of large service framework

We often discuss microservices. The first thing we should understand is what the core of microservices is, so that we can more accurately grasp the needs when we make technology selection.

As far as my personal understanding is concerned, I think the core of microservices is remote communication and service governance. Telecommunication provides a bridge for communication between services, and service governance provides logistics for services. Therefore, when we make technology selection, we should consider these two core requirements more.

We know that the splitting of services increases the cost of communication, especially in some rush-buying or promotional business scenarios, if there are method calls between services, for example, the order system, payment system, coupon package system, etc. need to be called after the snap-up is successful. This remote communication can easily become the bottleneck of the system. Therefore, under the premise of meeting certain service governance requirements, the performance requirements for remote communication are the main influencing factors for technology selection.

At present, service communication in many microservice frameworks is based on RPC communication. Without component extension, SpringCloud implements RPC communication based on Feign components (based on Http+Json serialization), and Dubbo is based on SPI extension. Many RPC communication frameworks, including RMI, Dubbo, Hessian and other RPC communication frameworks (the default is Dubbo+Hessian serialization). In different business scenarios, the selection and optimization criteria for RPC communication are also different.

For example, when our department I mentioned at the beginning chose Dubbo when choosing a microservice framework. The selection criterion at that time was that RPC communication could support high concurrency of rush-buying. In this business scenario, the request was characterized by instantaneous peak, large request volume, and small incoming and outgoing parameter data packets. The Dubbo protocol in Dubbo supports this request well.

The following is a simple performance test based on Dubbo:2.6.4 version. Test the communication performance of Dubbo+Protobuf serialization and Http+Json serialization respectively (here mainly simulates the performance comparison of single TCP long connection + Protobuf serialization and short connection Http+Json serialization). In order to verify the performance of the two in the case of different data volumes, I prepared performance pressure tests for small objects and large objects respectively. In this way, we can also indirectly understand the level of RPC communication between the two.

This test is my previous accumulation. Based on the complexity of the test environment, I will directly give the results here. If you are interested, you can leave a message to discuss with me.

Through the above test results, it can be found that the RPC communication framework implemented by a single TCP long connection + Protobuf serialization has very obvious advantages in terms of response time and throughput.

In high-concurrency scenarios, when we choose the back-end service framework or the middleware department designs the service framework by itself, RPC communication is the focus of optimization.

In fact, there are many mature RPC communication frameworks at present. If your company does not have its own middleware team, you can also expand based on the open source RPC communication framework. Before formally optimizing, we might as well briefly review RPC.

2. What is RPC communication

When it comes to RPC, do you still think of the concepts of MVC and SOA? These concepts can be easily confused if you have not experienced the evolution of these architectures. You can use the picture below to understand the evolution history of these architectures.

 Whether it is microservices, SOA, or RPC architecture, they are all distributed service architectures, and they all need to realize mutual communication between services. We usually refer to this kind of communication as RPC communication.

RPC (Remote Process Call), that is, remote service call, is a communication technology that requests remote computer program services through the network. The RPC framework encapsulates underlying network communication, serialization and other technologies. We only need to introduce the interface package of each service in the project, and then calling RPC services in the code is the same as calling local methods. Because of this convenient and transparent remote call, RPC is widely used in current enterprise-level and Internet projects, and is the core of realizing distributed systems.

RMI (Remote Method Invocation) is one of the first frameworks to implement RPC communication in JDK. The implementation of RMI is very important for building distributed Java applications. It is a very important underlying technology of the Java system. Many open source RPC communication frameworks are also It is designed based on the principle of RMI implementation, including the Dubbo framework also connected to the RMI framework. Next, let's understand the implementation principle of RMI and see what performance bottlenecks it has to be optimized.

3. RMI: RPC communication framework that comes with JDK

At present, RMI has been very maturely applied in EJB and Spring framework, and is the core solution of pure Java network distributed application system. RMI realizes that a virtual machine application's call to a remote method can be the same as a call to a local method. RMI helps us encapsulate the content about remote communication.

3.1, RMI implementation principle

The RMI remote proxy object is the core component of RMI. In addition to the virtual machine where the object itself resides, other virtual machines can also call the method of this object. Moreover, these virtual machines may not be on the same host, and through the remote proxy object, the remote application can use the network protocol to communicate with the service.

We can use a picture to understand the entire RMI communication process in detail:

3.2. Performance bottleneck of RMI in high concurrency scenarios

  • Java default serialization

The serialization of RMI adopts the default serialization method of Java. I introduced Java serialization in detail in the 09 lecture. We know that its performance is not very good, and other language frameworks do not support Java serialization for the time being.

  • TCP short connection

Since RMI is implemented based on TCP short connections, under high concurrency conditions, a large number of requests will lead to the creation and destruction of a large number of connections, which undoubtedly consumes a lot of performance for the system.

  • Blocking Network I/O

In Lecture 08, I mentioned that there is an I/O bottleneck in network communication. If the traditional I/O model is used in Socket programming, network communication based on short connections in high concurrency scenarios will easily cause I/O blocking. Performance will be greatly reduced.

4. An optimized path for RPC communication in a high-concurrency scenario

The performance bottlenecks of SpringCloud's RPC communication and RMI communication are very similar. SpringCloud is implemented based on the Http communication protocol (short connection) and Json serialization, which has no advantage in high concurrency scenarios. So, how can we optimize an RPC communication in a scenario of instantaneous high concurrency?

RPC communication includes operations such as establishing communication, implementing messages, transmitting protocols, and encoding and decoding transmitted data. Next, we will start from the optimization of each layer and gradually realize the overall performance optimization.

4.1. Select the appropriate communication protocol

To achieve network communication between different machines, we must first understand the basic principles of computer system network communication. Network communication is the process of exchanging data streams between two devices, which is realized based on the network transmission protocol and the codec of the transmitted data. Among them, the network transmission protocols include TCP and UDP protocols. These two protocols are based on the Socket programming interface and are extended for certain application scenarios. Through the following two diagrams, we can roughly understand the process of Socket network communication based on TCP and UDP protocols.

 The Socket communication based on the TCP protocol is connected, and the reliability of the data transmission is achieved through a three-way handshake to transmit data, and the transmitted data has no boundaries, and the byte stream mode is adopted.

Socket communication based on the UDP protocol, the client does not need to establish a connection, but only needs to create a socket to send the datagram to the server, so there is no guarantee that the datagram will reach the server, so in terms of data transmission, based on the UDP protocol The implemented Socket communication is unreliable. The data sent by UDP adopts the datagram mode, and each UDP datagram has a length, which will be sent to the server together with the data.

Through comparison, we can get the optimization method: In order to ensure the reliability of data transmission, we usually use the TCP protocol. If it is in a LAN and there is no requirement for the reliability of data transmission, we can also consider using the UDP protocol. After all, this protocol is more efficient than the TCP protocol.

4.2. Use a single long connection

If the Socket communication is implemented based on the TCP protocol, what other optimizations can we do?

Communication between services is different from communication between clients and servers. Due to the large number of clients on the client side and the server side, implementing requests based on short connections can avoid occupying connections for a long time, resulting in waste of system resources.

However, in the communication between services, there will not be as many connected consumers as the client, but the number of requests from the consumer to the server is the same. We implement it based on long connections, which can save a lot of TCP to establish and close connections operation, thereby reducing system performance consumption and saving time.

4.3. Optimize Socket communication

To establish network communication between two machines, we generally use Java's Socket programming to implement a TCP connection. Traditional Socket communication mainly has problems such as I/O blocking, thread model defects, and memory copying. We can use a more mature communication framework, such as Netty. Netty4 has optimized many aspects of Socket communication programming, see below for details.

Realize non-blocking I/O: In Lecture 08, we mentioned that the multiplexer Selector realizes non-blocking I/O communication.

Efficient Reactor threading model: Netty uses the master-slave Reactor multi-threading model. The server uses a main thread to receive the client request connection. This main thread is used for the connection request operation of the client. Once the connection is established successfully, it will monitor the I /O event, a link request will be created after listening to the event.

Link requests will be registered to the I/O worker thread responsible for I/O operations, and the I/O worker thread will be responsible for subsequent I/O operations. Using this threading model can solve the problem caused by the inability of a single NIO thread to monitor a large number of clients and satisfy a large number of I/O operations under high load and high concurrency conditions.

Serial design: After the server receives the message, there are link operations such as encoding, decoding, reading and sending. If these operations are implemented based on parallelism, it will undoubtedly lead to serious lock competition, which will lead to a decrease in system performance. In order to improve performance, Netty uses serial lock-free to complete link operations, and Netty provides Pipeline to implement each operation of the link without thread switching during operation.

Zero copy: In Lecture 08, we mentioned that a data is sent from the memory to the network, there are two copy actions, first from the user space to the kernel space, and then from the kernel space to the network I/O. The ByteBuffer provided by NIO can use the Direct Buffers mode to directly open up a non-heap physical memory, without the need for a second copy of the byte buffer, and can directly write data to the kernel space.

In addition to the above optimizations, we can also improve network throughput for some TCP parameter configuration items provided by socket programming. Netty can set these parameters based on ChannelOption.

TCP_NODELAY: The TCP_NODELAY option is used to control whether to enable the Nagle algorithm. The Nagle algorithm forms small data packets into a large data packet by caching, thereby avoiding a large number of small data packets from blocking the network and improving the efficiency of network transmission. We can turn off this algorithm to optimize application scenarios that are sensitive to delay.

SO_RCVBUF and SO_SNDBUF: The size of the socket send buffer and receive buffer can be adjusted according to the scenario.

SO_BACKLOG: The backlog parameter specifies the size of the client connection request buffer queue. The server processes client connection requests sequentially, so only one client connection can be processed at a time. When multiple clients come in, the server will put the client connection requests that cannot be processed in the queue. pending processing.

SO_KEEPALIVE: When this option is set, the connection will check the connection status of the client that has not sent data for a long time. After detecting that the client is disconnected, the server will recycle the connection. We can set this time to be shorter to improve the efficiency of recycling connections.

4.4. Tailor-made message format

The next step is to implement the message. We need to design a set of messages to describe specific verification, operation, and data transmission. In order to improve the efficiency of transmission, we can consider the design according to our own business and architecture, and try our best to realize the characteristics of small body, satisfying functions, and easy parsing. We can refer to the following data format:

4.5, encoding, decoding

We have analyzed the process of serialization encoding and decoding. For implementing a good network communication protocol, it is very important to be compatible with an excellent serialization framework. If it is just pure data object transmission, we can choose Protobuf serialization with relatively good performance, which is beneficial to improve the performance of network communication.

4.6. Adjust the TCP parameter setting options of Linux

If RPC is implemented based on TCP short connection, we can optimize network communication by modifying Linux TCP configuration items. Before starting the optimization of TCP configuration items, let's first understand the three-way handshake of establishing a TCP connection and the four-way handshake of closing a TCP connection, which will help the understanding of the following content.

three handshake

four handshake

We can run the sysctl -a | grep net.xxx command to view the default TCP parameter settings of the Linux system. If you need to modify a certain configuration, you can edit vim/etc/sysctl.conf to add the configuration items that need to be modified, and pass The sysctl -p command runs to take effect for the modified configuration item settings. Usually we will improve network throughput and reduce latency by modifying the following configuration items.

 The above is our detailed explanation of RPC optimization from different levels. Except for the final optimization of TCP configuration items in the Linux system, other optimizations are more from the perspective of code programming optimization, and finally a set of RPC communication framework is realized. Optimize the path.

After understanding these, you can make technology selection according to your own business scenarios, and you can also solve some performance problems that arise during the process.

5. Summary

In today's distributed system, especially when the system is becoming micro-service, the communication between services is very frequent. Mastering the communication principle and communication protocol optimization between services is an essential skill for you.

In some systems with many concurrent scenarios, I prefer to use this set of RPC communication protocols implemented by Dubbo. The Dubbo protocol is a single long-connection communication established. The network I/O is NIO non-blocking read and write operations. It is more compatible with Kryo, FST, Protobuf and other serialization frameworks with outstanding performance. It is very suitable for business scenarios with high concurrency and small object transmission. practical.

In an enterprise-level system, the business is often more complex than ordinary Internet products. There may be more than just data transmission between services, but also the transmission of pictures and files. Therefore, the RPC communication protocol design considers more functional requirements. There is no pursuit of perfection in terms of performance. Other communication frameworks have more advantages in terms of functionality, ecology, ease of use, and ease of entry.

6. Thinking questions

At present, there are many frameworks for implementing Java RPC communication, and there are also many protocols for implementing RPC communication. Besides the Dubbo protocol, have you ever used other RPC communication protocols? Through the study of this lecture, can you compare and talk about the advantages and disadvantages of each?

Guess you like

Origin blog.csdn.net/qq_34272760/article/details/132498767