Synchronous call of dubbo vs mac-rpc performance evaluation

Dubbo is a Java high-performance distributed microservice framework open sourced by Alibaba. Based on the remote method invocation function, it exposes and manages the services in the system in the form of remote method invocation (RPC), and provides supporting service-oriented (SOA) governance methods, thus forming a complete distributed microservice framework system.

The Dubbo project probably started in 2009, but for unknown reasons, the official maintenance was stopped in 2012. It is quite dramatic that Dubbo is popular with many third-party manufacturers in China and is widely used in their projects. Therefore, the official restarted the maintenance of the Dubbo project in 2017, and the latest version is 2.6.1.

mac-rpc is a new open source RPC framework based on Java AIO, small and powerful. The powerful asynchronous capability is its biggest highlight. Dubbo is relatively weak in asynchronous support, while mac-rpc is inherently asynchronous, making mac-rpc quite advantageous in the performance of asynchronous method calls. (Please visit the official website www.boarsoft.com to understand its implementation principle ).

As an RPC framework, most application scenarios are synchronous method calls. The performance of mac-rpc in synchronous method calls is also very good, and it can compete with dubbo. Therefore, the author only compares the performance of the two synchronous method calls .

For the evaluation of asynchronous calls, see: "Asynchronous Calls for Dubbo vs mac-rpc Performance Evaluation"

1. Test environment

Hardware: One CPU Intel i5-6300HQ @2.30GHz for laptop, 8G memory

Software: JDK1.7, dubbo 2.6.1, mac-rpc 1.0.1

Note: The notebook computer will automatically reduce the frequency when the CPU temperature is too high, resulting in fluctuations in the test results. The author has to take turns to execute, and the average value of multiple tests is carried out. The resulting data may not be very accurate, but gives a general idea of the performance level of both.

2. Test Scenario

There are many adjustable parameters for Dubbo, and the tuning process itself is very complicated and time-consuming. Therefore, this test will use its default parameters as much as possible. After several rounds of testing, Dubbo has the best performance when using a fixed-size thread pool. At the same time, in order to avoid the exception of thread pool exhaustion thrown by Dubbo, we fixed the thread pool size of the server to 600, and used 300 threads on the consumer side. Initiate calls in parallel.

In addition, dubbo does not perform well when transferring large objects, and the performance of dubbo will be seriously degraded if the amount of data is too large. So we just let the test program assemble and return 100,000 to 10,000 characters. Because in the actual application process, the vast majority of RPC method calls, the amount of data round-tripping through the network for one call is roughly within this range.

Note: As an RPC framework, it's OK to only transfer small objects during method calls. However, as a microservice framework, there are various forms of services in the system. If all services are restricted from returning large objects, it seems to be somewhat unsatisfactory. At this point mac-rpc is more advantageous.

Three, test points

In production practice, we can see such a phenomenon that only a few milliseconds or even zero milliseconds are required for a single test, but in a stress test or production environment, the delay is tens or hundreds of milliseconds. Even a few seconds until the specified response time is exceeded. There are many reasons for this phenomenon, including: disk IO, network IO, thread switching, lock waiting, high CPU load, insufficient memory, frequent GC, third-party system delay, etc. It is often found that when the system throughput is low and the response is slow, the physical resource consumption is very small.

In order to simulate the production running environment more realistically, we use Thread.sleep in the method to simulate the delay caused by the above reasons. Observe the actual performance of RPC method calls affected by these factors. In addition, in the case of high pressure, the size of the input and output data also has a significant impact on the performance, we also need to test the performance under different data volumes.

Another important factor is the granularity of microservices. The smaller the service granularity, the higher the flexibility, but the greater the overhead of calling each other in the system, and the more complicated the management. Proper granularity is very important to improve the throughput of the system.

3. Test method

The service consumer uses 300 threads to initiate synchronous RPC method calls in parallel, and the service provider uses a fixed thread pool of 600 threads. Simulate the performance when the data volume is 10, 1000, 5000, and 10000 characters under the delay of 0ms, 10ms, 20ms, 50ms, and 100ms.

At the same time, in order to avoid the impact of disk IO and network IO on performance, the test program does not log or write to disk. Both the service consumer and the service provider are on the same computer.

Note: Due to the long delay and the large amount of data, the test time increases sharply. In order to save time, the number of calls per thread is correspondingly reduced to 5000 or 2000 times as the delay increases and the amount of data increases.

4. Test results

Note: per thread = number of calls made per thread

5. Resource consumption

In terms of resources, Dubbo's thread pool uses the SynchronousQueue queue, which means that when the service consumer initiates a call through 300 threads, the service provider will also call up 300 threads for execution. The thread pool of mac-rpc uses LinkedBlockingQueue by default, so that the number of threads of the service consumer has no effect on the service provider. At the same time, mac-rpc uses CallerRunsPolicy by default as the handling strategy for insufficient threads, while dubbo uses AbortPolicy. Unless the cached thread pool is used, dubbo will throw RejectedExecutionException when there are insufficient threads. No matter from the convenience of configuration or flexibility, mac-rpc is better.

In the case of using a fixed thread pool of 600 threads, mac-rpc is slightly higher than dubbo (20%) in CPU usage (25%), and significantly better than dubbo (50~200M) in memory usage, stable at 50M or so.

Dubbo service provider resource consumption:

mac-rpc service provider resource consumption:

6. Test conclusion

At that time, the delay of the RPC method was small, and the amount of data was also small. In this case, it was less than 15ms, and when the number of bytes was less than about 2000~3000, Dubbo had obvious advantages over mac-rpc. Later, with the increase of delay and data volume, the advantages of mac-rpc gradually emerged, the response time and TPS were relatively stable, while the performance of dubbo decreased significantly, and the gap was large. Continue to increase the pressure, dubbo's performance will be worse, mac-rpc is relatively better.

On the whole, mac-rpc has better performance, better stability and better ability to withstand pressure than dubbo in terms of synchronous method calls. At present, mac-rpc has simple microservice governance capabilities. However, due to the limited energy of the author and the short development time, it needs to be improved and strengthened in terms of functionality and governance methods.