Dubbo source code analysis 1: How to write an RPC framework by hand?

Insert picture description here

Introduction

When developing a monolithic project, everyone must have written similar code. That is, the service provider and the service caller are in one service

public interface HelloService {
    
    
    public String sayHello(String content);
}

public class HelloServiceImpl implements HelloService {
    
    

    @Override
    public String sayHello(String content) {
    
    
        return "hello, " + content;
    }
}

public class Test {
    
    

    public static void main(String[] args) {
    
    
        HelloService helloService = new HelloServiceImpl();
        String msg = helloService.sayHello("world");
        // hello world
        System.out.println(msg);
    }
}

However, due to the many drawbacks of single services, many companies have now split unrelated functions into different services.

How to call remote services like local services? At this time, I have to mention the RPC framework (Remote Procedure Call). He helped us shield the realization of network communication, serialization and other operations, and truly made calling remote services as convenient as calling local services.

Well-known RPC frameworks include Spring Cloud, Alibaba's Dubbo, Facebook's Thrift, Google grpc, etc.

RPC call process

Insert picture description here
The process of an RPC call is as follows

The method to be called by the proxy class after the caller sends the request, and the parameters are assembled into a message body capable of network transmission
The caller sends the message body to the provider
The provider decodes the message and gets the parameters of the call
The provider executes the corresponding method in reflection and returns the result

Below we analyze how the rpc framework is implemented? What can be expanded.
In order to let everyone have a more vivid understanding, I wrote a github project, from simple to difficult to achieve a rpc framework, welcome to star

https://github.com/erlieStar/simple-rpc

Generate proxy class

As we said before, after the caller executes the method, it actually executes the method of the proxy class. The proxy class helps us with serialization and encoding and decoding operations. So how to generate proxy class?

Let's take a look at mainstream practices.

Facebook's Thrift and Google's grpc both define a schema file, and then execute the program to help you generate client proxy classes and interfaces. The caller directly uses the generated proxy class to request, and the provider can inherit the generated interface.

The biggest advantage of this method is that it can communicate in multiple languages , that is, a schema file can generate Java programs or Python programs. The caller is a Java program, and the provider is a Python program, they can communicate normally. And it is a binary protocol, and the communication efficiency is relatively high .

There are several ways to generate proxy classes in Java

JDK dynamic proxy (implement the InvocationHandler interface)
Bytecode manipulation library (such as cglib, Javassist)

There are two ways to generate proxy classes in Dubbo, jdk dynamic proxy and Javassist. The default is javassist. As for the reason? Of course, javassist is more efficient

protocol

Why is there a need for agreement? Spring Cloud communicates through the Http protocol, so which protocol does Dubbo communicate through?

Why do we need an agreement?

Because the data is transmitted on the network in binary form, the RPC request data is not sent to the provider as a whole, but may be split into multiple data packets and sent out. How does the provider identify the data?

For example, for a text ABCDEF, the data that the provider receives in turn may be ABC DEF, or AB CD EF. What should the provider do with this data?

Simple, just set a rule. There can be many kinds of this rule, here are 3 examples

Fixed length protocol , the length of the content of the protocol is fixed, if 50 bytes are read, the decode operation will start, you can refer to Netty's FixedLengthFrameDecoder
Special terminator , defines a delimiter at the end of a message. If it reads \n, it means that a data has been read. If it is not read, it will continue to be read. You can refer to Netty's DelimiterBasedFrameDecoder
Variable length protocol (protocol header + protocol body) , a fixed length is used to indicate the length of the message body, and the rest of the content is the message body. If you want, the protocol header will also put some commonly used attributes. The Header of the Http protocol is Protocol header, such as content-type, content-length, etc. You can refer to Netty's DelimiterBasedFrameDecoder

Dubbo communicates through a custom protocol. The format of the protocol header is as follows
Insert picture description here
. The meaning of each bit is as follows

Why should Dubbo customize the protocol instead of the existing Http protocol?

The main reason is that custom protocols can improve performance

The request packet of the Http protocol is relatively large and contains a lot of useless content. Custom protocols can streamline a lot of content
Http protocol is stateless, the connection must be re-established each time, and the connection will be closed after the response is completed

How to customize the agreement?

Serialization

The content of the protocol header is represented by bits, and the protocol body will be encapsulated into an object in the application. For example, Dubbo encapsulates the request into Request and the response into Response
Insert picture description here

Earlier we said that the data transmitted by the network must be binary data, but the input parameters of the caller and the return value of the provider are objects, so the process of serialization and deserialization is required

There are several ways to serialize

JDK native serialization
JSON
Protobuf
Kryo
Hessian2
MessagePack

When we choose the serialization method, we mainly consider the following factors

effectiveness
Space overhead
Versatility and compatibility
safety

communication

There are four common IO models as follows

Synchronous blocking IO (Blocking IO)
Synchronous non-blocking IO (Non-blocking IO)
IO Multiplexing (IO Multiplexing)
Asynchronous IO (Asynchronous IO)

I will not elaborate on these 4 IO models separately, see the following article

Understand in 10 minutes, the underlying principle of Java NIO

Because RPC is generally used in high concurrency scenarios, we choose the model of IO multiplexing. Netty's IO multiplexing is implemented based on the Reactor development model. I will analyze how this development model is in a follow-up article. Support high concurrency

Registry

The role of the registration center is similar to the phone book . The mapping relationship between the service name and the specific service address is saved. When we want to communicate with a service, we only need to find the service address based on the service name.

More importantly, the phone book is dynamic . When the address of a service changes, the address in the phone book will change, and when a service is unavailable, the address in the phone book will disappear.

This dynamic phone book is the registration center.

There are many ways to implement the registry, such as Zookeeper, Redis, Nocas, etc.

Introduce the way to implement the registration center with Zookeeper

Zookeeper has two types of nodes, persistent nodes and temporary nodes

When we register a service on zookeeper, we use a temporary node , so that when the service is disconnected, the node can be deleted

Node type	Explanation
Persistent node	Create a node as a persistent node, and the data will always be stored on the zookeeper server. Even if the session between the client and server that created the node is closed, the node will still not be deleted
Persistent sequence node	On the basis of persistent nodes, the order of nodes is added
Temporary node	Create the node as a temporary node, the data will not always be stored on the zookeeper server, when the client session that created the temporary node is closed, the node will be deleted on the corresponding zookeeper server
Temporary Sequence Node	On the basis of temporary nodes, the order of nodes is added

How to communicate when the registration center hangs up?

When a zookeeper hangs up, it will automatically switch to another zookeeper. It doesn’t matter if you all hang up, because dubbo saves a copy of the mapping relationship locally. This mapping relationship can be saved in a Map or in a file.

When a new service is registered in the registry, will the local cache be updated?

If you register for monitoring, of course it will be updated. When the monitored node or child node changes, the corresponding content will be pushed to the monitoring client, and you can update the local cache

The events in Zookeeper are as follows.
Insert picture description here
You can understand this monitoring as a distributed observer mode

Load balancing strategy

It is impossible for us to deploy only one node for the same service. We need to select a node to initiate the call every time we call, which involves the load balancing strategy.

Common load balancing strategies are as follows:

random
polling
Consistent hash

summary

Of course, a mature RPC framework has to consider many things, such as routing strategies, abnormal retry, monitoring, asynchronous calls, etc., which are not related to the main process, so I won’t introduce more.

Reference blog

[1]https://blog.csdn.net/zzti_erlie/article/details/82292083
[2]https://www.cnblogs.com/LBSer/p/4853234.html
协议
[3]https://dubbo.apache.org/zh-cn/blog/dubbo-protocol.html