Dubbo 3.0 Preview: Comparison of Common Protocols and Exploration of New Forms of RPC Protocol

Head picture.png

Author | Guo Hao (Xiang Sheng) Head of RPC Framework for Alibaba Economy

Introduction : The Dubbo community has planned a series of articles [Dubbo Cloud Native Road] to review the development of Apache Dubbo products and the community, and look forward to the future development. The series of articles mainly cover three parts: Dubbo technical interpretation, community operation, and application case analysis. This article is the fourth in a series.

Preface

The protocol is the basis of RPC. In what format is the data transmitted on the connection, how does the server determine the size of the received request, can multiple requests exist on the same connection at the same time, and how to respond if the request goes wrong... These are all issues that need to be resolved by the protocol.

In terms of definition, the protocol stipulates how data is transmitted between networks by defining rules, formats and semantics. RPC requires both ends of the communication to be able to recognize the same protocol. Data is transmitted in a bit stream on the network. If the peer end does not recognize the local protocol, the peer will not be able to obtain useful information from the request. This will lead to a situation where the upper-layer business needs cannot be met.

1.png

A simple protocol needs to define the data exchange format, protocol format and request method.

The data exchange format is also called serialization format in RPC. Commonly used serializations include JSON / Protobuf / Hessian, etc. The pros and cons of serialization are generally evaluated from three dimensions:

  • The size of the byte array after serialization
  • Serialization and deserialization speed
  • Readability after serialization

When the protocol selects the serialization method, it chooses between these three dimensions according to specific needs. The smaller the serialized array is, the more network traffic is saved, but the serialization process may be more time-consuming. Text-based serialization methods such as JSON\XML are often easier to be accepted by developers, because the text is easier to understand than a continuous byte array, and can be easily recognized in all layers of equipment, but it can The consequence of improved readability is a significant decrease in performance.

The protocol format is closely related to the RPC framework. There are two types according to functions:

  • One is a compact protocol, which only provides simple metadata and data content for calling;
  • The other is a composite protocol, which carries metadata of the framework layer to provide functional enhancements. A representative of this type of protocol is RSocket.

The request method is closely related to the protocol format. Common request formats include synchronous Request/Response and asynchronous Request/Response. The difference is whether the client needs to wait for the response synchronously after sending a request. If there is no need to wait for a response, there can be multiple outstanding requests on a link at the same time, which is also called multiplexing. Another request model is Streaming. There are multiple RPCs in a complete business call, and part of the data is transmitted each time, suitable for streaming data transmission.

With these three basic conventions, a simple RPC protocol can be implemented.

2.png

A core content of Dubbo3 is to define the next-generation RPC protocol. In addition to basic communication functions, the new protocol should also have the following characteristics:

  • Unified cross-language binary format
  • Support Streaming and application layer full-duplex call model
  • Easy to extend
  • Can be recognized by all layers of equipment

Here we compare some commonly used protocols to explore the form of new protocols.

HTTP/1.1

HTTP/1.1 should be the most widely used protocol. Its simple and clear syntax, cross-language and support for native mobile terminals make it the most widely accepted RPC solution in fact.

However, from the requirements of the RPC protocol, HTTP 1.1 has the following problems

  • Head-of-line blocking (HOL) leads to poor performance in a single connection. Even though the pipeline is supported, the response cannot be prevented from returning in order;

  • The text-based protocol will carry a lot of complicated and useless header information repeatedly in each request, wasting bandwidth and affecting performance;

  • The pure Request/Response request model cannot implement Server Push and can only rely on client polling. Similarly, the full duplex of Streaming is also insecure.

3.png

RESP

RESP is a communication protocol used by Redis, and its concise and easy-to-understand format also facilitates the rapid development of Redis clients in various languages. But this protocol similar to HTTP/1.1 also has the same performance problems.

  • The ability to express serialization is weak, and other serialization methods are usually needed. However, the protocol does not support setting a specific serialization method, and can only rely on the client's agreement;

  • There is also the head-of-line blocking problem, and the pipeline cannot fundamentally solve the single-connection performance problem;

  • Pub/Sub also has a volume bottleneck in the case of a single connection.

Dubbo2.0

The Dubbo 2.0 protocol is directly defined on the TCP transport layer protocol, which provides the greatest flexibility for the definition of protocol functions. At the same time, it is precisely because of this obvious flexibility advantage that RPC protocols are generally customized private protocols.

There is an extensible attachments part in the body of the Dubbo protocol body, which makes it possible to pass additional attributes outside of the RPC method, which is a good design. However, the similar header part lacks similar extensible attachments. For this, please refer to the Ascii Header design defined by HTTP to divide the responsibilities of Body Attachments and Header Attachments.

  • Some RPC request locators in the Body protocol body, such as Service Name, Method Name, Version, etc., can be mentioned in the Header and decoupled from the specific serialization protocol to better be recognized by the network infrastructure or used for traffic control;

  • Extensibility is not good enough, lack of protocol upgrade design, such as no reserved status flag in the Header, or HTTP has a special packet designed for protocol upgrade or negotiation;

  • In the implementation of the Java version of the code, it is not concise and versatile enough. For example, in link transmission, there are some language-bound content; there are redundant content in the message body, such as Service Name in both Body and Attachments.

HTTP/2.0

HTTP/2.0 retains all the semantics of HTTP/1. While maintaining compatibility, it has made great improvements in communication model and transmission efficiency, mainly to solve the problems in HTTP/1.

  • Supports multiplexing on a single link. Compared with the Request-Response exclusive link, it realizes a more efficient use of the link based on Frame. StreamId provides context status, and the client can support out-of-order Response return according to StreamId;

  • Header compression HPACK, implements Header caching based on static tables and dynamic tables, reducing the amount of transmitted data;

  • Request-Stream semantics, natively supports Server Push and Stream data transmission;

  • Binary Frame, binary framing, can handle Header and Data separately.

Although HTTP/2.0 overcomes the above problems, there are still some controversial points, such as the necessity of flow control at the upper layer of TCP, and whether it is too complicated and complicated for HTTP semantics to be compatible with HPACK.

gRPC

Compared with some frameworks that build application layer protocols on bare TCP, gRPC chooses HTTP/2.0 as the transport layer protocol. The upper-layer protocol function is realized by restricting the header content and payload format.

Here are some design concepts of gRPC:

  • Coverage & Simplicity, the protocol design and framework implementation should be universal and simple enough to run on any device, even some resources such as IoT, Mobile and other devices;

  • Interoperability & Reach, to be built on a more general protocol, the protocol itself must be supported by almost all infrastructure on the network;

  • General Purpose & Performant, it is necessary to balance the scene and performance. First of all, if the protocol itself is suitable for various scenarios, it should also have as high performance as possible;

  • Payload Agnostic, the payload transmitted on the protocol should be language and platform neutral;

  • Streaming must support communication models such as Request-Response, Request-Stream, and Bi-Steam;

  • Flow Control, the protocol itself has the ability to sense and restrict flow;

  • Metadata Exchange, in addition to the RPC service definition, provides additional data transmission capabilities.

Under the guidance of this design concept, gRPC was finally designed as a cross-language, cross-platform, and universal protocol. The functions are basically fully equipped or can be easily extended to new functions needed. However, we know that there is no silver bullet in software engineering. Compared with the bare TCP proprietary protocol, gRPC is definitely worse in terms of extreme performance. But for most applications, compared to the HTTP/1.1 protocol, gRPC/HTTP2 has made great progress in performance while taking into account readability.

In terms of serialization, gRPC is designed to maintain payload neutrality, but actual cross-language scenarios require a strongly standardized interface definition language to ensure consistent serialization results. In the official implementation of gRPC, protobuf and json are used to support performance scenarios and development efficiency scenarios, respectively. From the choice of serialization method to the comparison of various dimensions of the protocol, it is the best choice to extend a new protocol based on gRPC.

Dubbo3.0

The Dubbo3.0 protocol is based on gRPC, which provides extensions on the application layer, exception handling, protocol layer load balancing support, and Reactive support. There are three main goals:

  • In the distributed large-scale cluster scenario, provide more complete load balancing to obtain higher performance and ensure stability;

  • Support tracing/monitoring and other distributed standard extensions, support microservice standardization and smooth migration;

  • Reactive semantics are enhanced at the protocol layer to provide distributed back-pressure capabilities and more complete Streaming support.

In addition to the support of the protocol layer, the new Dubbo3.0 protocol also includes support for ease of use, including support for both IDL compiler and Annotation Compiler. The client will better support native asynchronous callbacks, Future asynchronous and synchronous calls, and the server will use non-reflective calls, which significantly improves the performance of the client and server. From the perspective of user migration, the Dubbo framework will provide smooth protocol upgrade support, and strive to achieve a doubled performance improvement with as little modification code or configuration as possible.

Series of articles:

to sum up

This article introduces the basic concepts of the RPC protocol, compares some commonly used protocols, and proposes the Dubbo3.0 protocol after comparing the advantages and disadvantages of these protocols. The Dubbo3.0 protocol will take a greater lead in terms of ease of use, cross-platform, cross-language, and high performance. It is expected that in March 2021, the Dubbo3.0 agreement will be fully supported. Please wait and see.

" Alibaba Cloud Native focuses on technical fields such as microservices, serverless, containers, and Service Mesh, focuses on the trend of cloud native popular technologies, and cloud native large-scale landing practices, and is the official account for developers who know best about cloud native."

Guess you like

Origin blog.51cto.com/13778063/2541340