Talk about DUBBO

What is dubbo

Dubbo is a lightweight, high-performance Java RPC framework.

He provides three core capabilities: interface-oriented remote method invocation, intelligent fault tolerance and load balancing, and registration of automatic service discovery . Provide high-performance and transparent RPC remote service call solution, and SOA service management solution.

What is Rpc

RPC (Remote Procedure Call) —remote procedure call, which is a protocol that requests services from a remote computer program over the network without the need to understand the underlying network technology. For example, if two different services A and B are deployed on two different machines, what if service A wants to call a method in service B? Of course, HTTP requests can be used, but it may be slow and some optimizations are not good. The emergence of RPC is to solve this problem.

What is the principle of RPC?


  1. The service consumer (client) calls the service in a local call;
  2. After receiving the call, the client stub is responsible for assembling methods, parameters, etc. into a message body capable of network transmission;
  3. The client stub finds the service address and sends the message to the server;
  4. The server stub decodes after receiving the message;
  5. The server stub calls the local service according to the decoding result;
  6. The local service executes and returns the result to the server stub;
  7. The server stub packages the returned result into a message and sends it to the consumer;
  8. The client stub receives the message and decodes it;
  9. The service consumer gets the final result.

What problem does rpc solve

Make calls between different services in a distributed or microservice system as simple as local calls

Existing HTTP, why use RPC for service calls?

rpc is a concept, a design, in order to solve the calling problem between different services, he will generally include the transmission protocol and serialization protocol .

However, HTTP is a protocol, and the RPC framework can use the HTTP protocol as the transmission protocol or directly use TCP as the transmission protocol. The use of different protocols is generally to adapt to different scenarios.

Transmission protocols include: http2 protocol used by the famous [gRPC] ( grpc / grpc.io ), and tcp protocol for custom messages such as dubbo.

Serialization protocols include: xml json based on text encoding, and protobuf binpack with binary encoding.

Why use custom tcp protocol rpc for back-end process communication

(Error) http protocol compared to custom tcp message protocol, the added overhead is the establishment and disconnection of the connection

http protocol supports connection pooling reuse, that is, to establish certain number of connections constantly open, and does not frequent the creation and destruction of connection.

The second thing to say is that http can also use protobuf, a binary serialization protocol to encode content, so the biggest difference between the two is still in the transmission protocol.

The tcp packet of the commonly defined http1.1 protocol contains too much waste information . Even if the encoding protocol is the body using a binary encoding protocol, the message metadata is the key-value pair of the header but the text encoding is used, which is very word-intensive. Section number. As shown in the figure above, the effective number of bytes in the message only accounts for about 30%, that is, 70% of the time is used to transmit metadata waste encoding. Of course, the actual content of the message may be longer than this, but the proportion of the header is also very considerable.

http is like mandarin, and rpc is like gang hacking.

The advantage of speaking Mandarin is that everyone can understand and everyone can speak.

The advantage of speaking black is that it can be more streamlined, more confidential, and more customizable. The disadvantage is that the party who "speaks" the black words (client side) must also understand, and once everyone speaks a black word, it is difficult to change the black words


Why use dubbo

I think it is mainly possible to use Dubbo from the following four characteristics provided by Dubbo:

Load balancing

When the same service is deployed on different machines, the service on that machine should be called.

Service call link generation

With the development of the system, there are more and more services, and the dependency relationship between services has become staggered and complicated. It is even impossible to tell which application should be started before which application. The architect cannot fully describe the architectural relationship of the application. Dubbo can help us solve how the services call each other.

Service access pressure and duration statistics, resource scheduling and governance

Manage cluster capacity in real time based on access pressure and improve cluster utilization.

Service degradation

After a service hangs up, the backup service is called.

Dubbo's architecture

Illustration of Dubbo's architecture


  • Provider: Service provider that exposes the service
  • Consumer: Service consumer that invokes remote services
  • Registry: Registry for service registration and discovery
  • Monitor: a monitoring center that counts the number and time of service calls
  • Container: Service running container

Call relationship description:

  1. The service container is responsible for starting, loading, and running the service provider.
  2. When the service provider starts, it registers its own service with the registration center.
  3. When service consumers start, they subscribe to the services they need from the registration center.
  4. The registration center returns the list of service provider addresses to consumers. If there is a change, the registration center will push the changed data to the consumer based on the long connection.
  5. Service consumers, from the provider address list, based on the soft load balancing algorithm, select a provider to call, if the call fails, then select another call.
  6. Service consumers and providers accumulate call times and call times in memory, and regularly send statistical data to the monitoring center every minute.

Dubbo's load balancing strategy

What is load balancing

For example, a service in our system has particularly heavy traffic. We deployed this service on multiple servers. When a client initiates a request, multiple servers can process the request. Then, how to choose the server that handles the request correctly is very important. If you need a server to handle the service request, the meaning of deploying the service on multiple servers no longer exists. Load balancing is to avoid a single server responding to the same request, which is easy to cause problems such as server downtime and crash. We can clearly feel its meaning from these four words of load balancing.

Random LoadBalance (default, weight-based random load balancing mechanism)

Random, set random probability according to weight.


The random strategy will first determine whether the weights of all Invokers are the same. If they are all the same, then the processing is relatively simple. Use random.nexInt (length) to randomly generate an Invoker's serial number, and select the corresponding Invoker according to the serial number. If the weight of the service provider is not set in Dubbo Admin, then the weight of all Invokers is the same, the default is 100 . If the weights are different, you need to set the random probability in combination with the weights.

The algorithm is roughly as follows: Suppose there are 4 Invokers.

invoker weight
A 10
B 20
C 20
D 30

The total weight of A, B, C and D is 10 + 20 + 20 + 30 = 80. Distribute 80 numbers in the following figure:

+-----------------------------------------------------------------------------------+
|          |                    |                    |                              |
+-----------------------------------------------------------------------------------+
1          10                   30                   50                             80

|-----A----|---------B----------|----------C---------|---------------D--------------|


---------------------15

-------------------------------------------37

-----------------------------------------------------------54复制代码

There are a total of 4 areas in the figure above, with lengths of A, B, C and D weighted respectively. Use random.nextInt (10 + 20 + 20 + 30) to randomly select one from 80 numbers. Then determine where the number is distributed. For example, if random to 37, 37 is distributed in the C area, then choose Invoker C. 15 is in area B, 54 is in area D.


 RoundRobin LoadBalance (not recommended, weight-based polling load balancing mechanism)

Round robin, set the round robin ratio according to the weights after the convention.

Polling load balancing is to call all providers in sequence. Like the random load balancing strategy, the polling load balancing strategy also has the concept of weight. The polling load balancing algorithm allows RPC calls to be distributed strictly according to the ratio we set. Whether it is a small number of calls or a large number of calls.

However, the polling load balancing algorithm also has shortcomings. There is a problem of slow Provider accumulation requests. For example: the second machine is very slow, but it does not hang. When the request is transferred to the second machine, it is stuck there. Over time, all requests All stuck in the second station.

LeastActive LoadBalance Minimum number of active calls

The goal is for slower machines to receive fewer requests.

The minimum number of active calls, the randomness of the same active number, the active number refers to the difference between counts before and after the call.
Make the slow provider receive fewer requests, because the slower provider will have a larger count difference before and after the call.

Each service maintains an active number counter. When the machine A begins to process the request, the counter is incremented by 1, and at this time A has not yet completed processing. If the processing is completed, the counter is decremented by 1. Machine B processed the request quickly after receiving it. Then the active numbers of A and B are 1, 0 respectively. When a new request is generated, the B machine is selected for execution (the minimum number of B actives), so that the slow machine A receives fewer requests.

If there is only one Invoker with the smallest active number, return the Invoker directly. If there are multiple Invokers with the smallest active number and the weights are not equal and the total weight is greater than 0, then a weight is randomly generated in the range of (0, totalWeight) Inside. Finally, choose the Invoker based on the randomly generated weights.

ConsistentHash LoadBalance  Consistency Hash

Requests with the same parameters are always sent to the same provider. (If what you need is not random load balancing, but one type of request all to one node, then follow this consistency hash strategy.)

When a provider hangs up, the request originally sent to the provider is based on the virtual node and is shared with other providers without causing drastic changes.

Zookeeper downtime and direct connection to dubbo

If the zookeeper registration center goes down, the service consumer can still call the provider's services for a period of time. In fact, it uses the local cache to communicate, which is only a manifestation of the robustness of dubbo.

dubbo robustness

  1. If the monitoring center goes down, it will not affect the use, only part of the sampled data will be lost
  2. After the database goes down, the registration center can still query the service list through the cache, but cannot register new services
  3. Registration center peer-to-peer cluster, after any one of them is down, it will automatically switch to another one
  4. After all the registration centers are down, service providers and service consumers can still communicate through the local cache
  5. The service provider is stateless, and if any one is down, it will not affect the use
  6. After the service provider is completely down, the service consumer application will be unavailable and will reconnect indefinitely waiting for the service provider to recover

Dubbo service exposed

Dubbo will notify the ServiceBean class that implements ApplicationListener to call back the onApplicationEvent event method when the ContextRefreshEvent event is released in the last step of refreshing the container after Spring bean is instantiated. Dubbo will call the export method of the ServiceBean parent serviceConfig in this method, and This method really implements the release of the service (asynchronous or non-asynchronous).

Dubbo Agreement

dubbo

A single long connection and NIO asynchronous communication are suitable for service calls with large concurrency and small data volume, and consumers are much larger than providers . Transmission protocol TCP, asynchronous, Hessian serialization;

rmi:

Adopt the JDK standard rmi protocol to realize, the transmission parameter and return parameter object need to implement the Serializable interface, use the java standard serialization mechanism, use the blocking short connection, the transmission packet size is mixed , the number of consumers and providers is similar, and the file can be transferred , Transmission protocol TCP. Multiple short connections, TCP protocol transmission, synchronous transmission, suitable for conventional remote service calls and rmi interoperation. Relying on the Common-Collections package of the lower version, there is a security hole in java serialization;

webservice

Based on WebService's remote calling protocol, integrated CXF implementation, providing interoperability with native WebService. Multiple short connections, based on HTTP transmission, synchronous transmission, suitable for system integration and cross-language calling ;

http

The remote calling protocol based on Http form submission is implemented using Spring's HttpInvoke. Multiple short connections, the transmission protocol HTTP , the size of the incoming parameters, the number of providers is more than the consumers , you need to call the application and browser JS;

hessian 

Integrated Hessian service, based on HTTP communication, using Servlet exposure service, Dubbo embedded Jetty as the default implementation of the server, providing interoperability with Hession service Multiple short connections, synchronous HTTP transmission, Hessian serialization, large incoming parameters, providers are greater than consumers, providers are under pressure, and files can be transmitted ;

memcache

RPC protocol based on memcached

repeat

RPC protocol based on redis


Is the call between Dubbo services blocked?

æ¶ˆè´¹ç«¯çº¿ç¨‹æ± .png


  1. The business thread makes a request and gets a Future instance.
  2. The business thread then calls future.get to block waiting for the business result to return.
  3. When the business data is returned, it is handed over to the independent Consumer-side thread pool for deserialization and other processing, and future.set is called to return the deserialized business results.
  4. The business thread returns the result directly

æ¶ˆè´¹ç«¯çº¿ç¨‹æ± æ–°.png

  1. The business thread makes a request and gets a Future instance.
  2. Before calling future.get (), first call ThreadlessExecutor.wait (). Wait will cause the business thread to wait on a blocking queue until elements are added to the queue.
  3. When the business data is returned, a Runnable Task is generated and put into the ThreadlessExecutor queue
  4. The business thread takes the Task and executes it in this thread: deserializes the business data and sets it to Future.
  5. The business thread returns the result directly

Compared with the old thread pool model, the business thread itself is responsible for monitoring and parsing the returned results, eliminating the extra consumption of thread pool overhead.


What is the difference between Dubbo and Spring Cloud?

Dubbo is a product of the SOA era, and its focus is mainly on service invocation, traffic distribution, traffic monitoring, and fuse.

The Spring Cloud was born in the era of microservice architecture, considering all aspects of microservice governance, in addition to relying on the advantages of Spring and Spring Boot

The two frameworks have different goals at the beginning. Dubbo's positioning service governance and Spring Cloud create an ecosystem.

The bottom layer of Dubbo uses an NIO framework such as Netty, which is transmitted based on the TCP protocol and cooperates with Hession serialization to complete RPC communication.

Spring Cloud is a communication based on the Http protocol Rest interface to call a remote process. Relatively speaking, Http requests will have larger packets and occupy more bandwidth. However, REST is more flexible than RPC. The service provider and the caller rely on a single contract, and there is no strong dependency at the code level. This is more appropriate in a microservice environment that emphasizes rapid evolution. Speed ​​is still convenient and flexible, considering the specific circumstances.




Guess you like

Origin juejin.im/post/5e9307f6f265da47e752702a