9 Questions about Dubbo in "I Want to Enter a Big Factory"

This is the fourth in the interview series, Dubbo series. Dubbo itself is not complicated, and the official documents are very clear and detailed. Generally, there are not many dubbo problems in the interview. From layering to working principle, load balancing strategy, fault tolerance mechanism, SPI mechanism, it is basically the same, the biggest one is big The question is generally how to design an RPC framework, but if you understand the working principle in layers, this question is actually equivalent to an answer, right?

Talk about Dubbo's layering?

From a large scale, dubbo is divided into three layers. The business logic layer is provided by ourselves to provide interfaces, implementations and some configuration information. The RPC layer is the core layer of the real RPC call, which encapsulates the entire RPC call process and load. Balance, cluster fault tolerance, proxy, remoting is the encapsulation of network transmission protocol and data conversion.

Divided into a more detailed level, it is the 10-layer model in the figure. The entire layer depends on from top to bottom. Except for business logic, the other layers are all SPI mechanisms.

Can you tell me how Dubbo works?

  1. When the service is started, the provider and consumer connect to the registry register according to the configuration information, and register and subscribe to the registry respectively.

  2. Register returns the provider information to the consumer according to the service subscription relationship, and the consumer caches the provider information locally. If the information changes, the consumer will receive a push from register

  3. The consumer generates a proxy object, selects a provider according to the load balancing strategy, and records the number of interface calls and time information to the monitor regularly.

  4. After getting the proxy object, the consumer initiates interface calls through the proxy object

  5. After the provider receives the request, it deserializes the data, and then uses the proxy to call the specific interface to achieve

Why communicate through proxy objects?

The main purpose is to implement a transparent proxy of the interface, to encapsulate the call details, so that users can call remote methods like local methods, and also to implement some other strategies through the proxy, such as:

1. Load balancing strategy called

2. Call failure, timeout, degradation and fault tolerance mechanism

3. Do some filtering operations, such as adding cache and mock data

4. Interface call data statistics

Talk about the process of service exposure?

  1. When the container is started, the tags are parsed through ServiceConfig, and a dubbo tag resolver is created to parse the dubbo tags. After the container is created, the ContextRefreshEvent event callback is triggered to expose the service
  2. Invoker is obtained through ProxyFactory, which contains the object information and specific URL address of the method to be executed
  3. Then through the implementation of DubboProtocol, convert the packaged invoker into an exporter, and then start the server server to monitor the port
  4. Finally, RegistryProtocol saves the mapping relationship between the URL address and the invoker, and registers to the service center at the same time

Talk about the service reference process?

After the service is exposed, the client must refer to the service, and then the invocation process.

  1. First, the client subscribes to the service from the registry according to the configuration file information

  2. After that, DubboProtocol connects to the server server according to the provider address and interface information obtained from the subscription, opens the client client, and then creates the invoker

  3. After the invoker is created, a proxy object is generated for the service interface through the invoker. This proxy object is used to remotely call the provider, and the service reference is completed

What are the load balancing strategies?

  1. Weighted random: Suppose we have a set of servers = [A, B, C], their corresponding weights are weights = [5, 3, 2], and the total weight is 10. Now tile these weight values ​​on one-dimensional coordinate values, [0, 5) interval belongs to server A, [5, 8) interval belongs to server B, and [8, 10) interval belongs to server C. Next, use a random number generator to generate a random number in the range [0, 10), and then calculate which interval this random number will fall on.

  2. Minimum active number: Each service provider corresponds to an active number active. Initially, the active number of all service providers is 0. Each time a request is received, the number of actives is increased by 1, and after the request is completed, the number of actives is reduced by 1. After the service has been running for a period of time, service providers with good performance can process requests faster, so the number of actives decreases faster. At this time, such service providers can first obtain new service requests.

  3. Consistent hash: Through the hash algorithm, generate a hash from the provider’s invoke and random nodes, and project this hash onto the circle of [0, 2^32-1]. When querying, perform md5 and then hash according to the key to get The value of the first node is greater than or equal to the invoker of the current hash.

Picture from dubbo official

  1. Weighted round-robin: For example, the weight ratio of server A, B, and C is 5:2:1, then in 8 requests, server A will receive 5 of them, and server B will receive 2 of them. C receives one of these requests.

What are the cluster fault tolerance methods?

  1. Failover Cluster failure automatic switching: Dubbo's default fault-tolerant scheme will automatically switch to other available nodes when the call fails. The specific number of retries and interval can be configured when the service is referenced. The default number of retries is 1, which means only calling once.

  2. Failback Cluster fails quickly: when the call fails, log and call information are recorded, and then empty results are returned to the consumer, and the failed call is retried every 5 seconds through a timed task

  3. Failfast Cluster failure automatic recovery: it will only be called once, and an exception will be thrown immediately after failure

  4. Failsafe Cluster failure safety: an exception occurs in the call, the log does not throw, and an empty result is returned

  5. Forking Cluster calls multiple service providers in parallel: multiple threads are created through the thread pool, multiple providers are called concurrently, and the results are saved to the blocking queue. As long as one provider successfully returns the results, the results will be returned immediately

  6. Broadcast Cluster broadcast mode: call each provider one by one, if one of them reports an error, an exception will be thrown after the loop call ends.

Do you understand the Dubbo SPI mechanism?

The full name of SPI is Service Provider Interface, which is a service discovery mechanism. The essence is to configure the fully qualified name of the interface implementation class in a file, and the service loader reads the configuration file and loads the implementation class, so that it can be dynamically at runtime. Replace the implementation class for the interface.

Dubbo also implements many extended functions through the SPI mechanism, and dubbo does not use the Java native SPI mechanism, but has been enhanced and improved in alignment.

SPI has many applications in dubbo, including protocol extension, cluster extension, routing extension, serialization extension and so on.

The usage can be configured in the META-INF/dubbo directory:

key=com.xxx.value

Then use Dubbo's ExtensionLoader to load the corresponding implementation class according to the specified key. The advantage of this is that it can be loaded on demand and performance is optimized.

How to design if you want to implement an RPC framework?

  1. First, a service registration center is needed so that consumers and providers can register and subscribe to services
  2. A load balancing mechanism is needed to determine how the consumer calls the client, which of course also includes fault tolerance and retry mechanisms
  3. A communication protocol and tool framework are required, such as communication through http or rmi protocols, and then choose which framework and tools to use for communication according to the protocol. Of course, the serialization of data transmission should be considered
  4. In addition to the basic elements, some monitoring, configuration management pages, and logs are additional optimization considerations.

So, in essence, as long as you are familiar with one or two RPC frameworks, it is easy to understand how we can implement an RPC framework ourselves.

Guess you like

Origin blog.csdn.net/awl910213/article/details/109148810