Heavy official announcement: Nacos2.0 released, performance increased by 10 times

Author: Schiwon

Since the release of Nacos 1.0, Nacos has been quickly adopted by thousands of companies and has built a strong ecosystem. However, as users use it in depth, some performance problems are gradually exposed. Therefore, we launched the inter-generation product design of Nacos 2.0. After half a year, we finally realized all of them. The measured performance has been increased by 10 times. We believe that it can meet the performance needs of all users. Let me introduce this cross-generation product to you on behalf of the community.

Introduction to Nacos

Nacos is a dynamic service discovery, configuration management and service management platform that makes it easier to build cloud-native applications. It was incubated in Alibaba, and grew up in the 10-year double eleven peak test, which has precipitated its core competitiveness that is easy to use, stable and reliable, and excellent in performance.

Nacos 2.0 architecture

The new 2.0 architecture not only greatly improves performance by 10 times, but also implements a layered abstraction of the kernel and implements a plug-in extension mechanism.

The architecture level of Nacos 2.0 is shown in the figure below. Compared with Nacos 1.X, the main changes are:

  • The communication layer is unified to the gRPC protocol, and at the same time, the flow control and load balancing capabilities of the client and server are improved, and the overall throughput is improved.

  • The storage and consistency models are fully abstracted and layered, the architecture is simpler and clearer, the code is more robust, and the performance is stronger.

  • Designed an extensible interface to improve integration capabilities, such as allowing users to extend their own security mechanisms.

Nacos2.0 Service Discovery Upgrade Consistency Model

For service discovery under the Nacos2.0 architecture, the client initiates a service registration or subscription service request through gRPC. The server uses the Client object to record which services the client has published using the gRPC connection, and which services it has subscribed to, and synchronize the Client between services. Since the actual usage habit is the mapping from service to client, that is, which client instances are under the service; therefore, the 2.0 server will quickly generate information similar to the Service in 1.X by building indexes and metadata, and integrate the service data Push through gRPC Stream.

Nacos2.0 configuration management upgrade communication mechanism

Before configuration management, use the Keep Alive mode of Http1.1 to send a heartbeat to simulate a long link for 30s. The protocol is difficult to understand, memory consumption is large, and push performance is weak. Therefore, 2.0 uses gRPC to completely solve these problems and greatly reduces memory consumption.

Nacos2.0 architecture advantages

Nacos2.0 greatly reduces resource consumption, improves throughput performance, optimizes the interaction between client and server, and is more user-friendly; although the observability is slightly reduced, the overall cost performance is very high.

Nacos2.0 performance improvement

Since Nacos is composed of two major modules: service discovery and configuration management, the business models are slightly different, so let's introduce the specific stress test indicators below.

Performance improvement of Nacos2.0 service discovery For
service discovery scenarios, we mainly focus on the number of clients, the number of service instances, and the number of service subscribers. In large-scale scenarios, the performance of the server in synchronization, push, and steady state. At the same time, it also pays attention to the performance of the system when a large number of services are going online and offline.

  • Capacity and steady state test

This scenario mainly focuses on system performance as the service scale and client instance scale increase.

It can be seen that version 2.0.0 can stably support the 10W client scale. After reaching a stable state, the CPU consumption is very low. Although in the initial large-scale registration stage, due to the instantaneous large-scale registration and push, there is a certain push timeout, but the push will be successful after retrying, and data consistency will not be affected.

In contrast to the 1.X version, under the 10W and 5W level clients, the server is completely in the Full GC state, the push completely fails, and the cluster is unavailable; under the 2W client scale, although the server is running normally, the heartbeat processing cannot be performed. In time, a large number of services are repeated during the removal and registration phases, so a stable state is not reached, and the CPU is always high. It can run stably under 1.2W client scale, but the CPU consumption in steady state is more than 3 times that of 2.0 under larger scale.

  • Frequent change test

This scenario mainly focuses on the large-scale business release and frequent service push conditions, and the throughput and failure rate of different versions.

During frequent changes, both 2.0 and 1.X can be supported stably after reaching a stable state. Among them, 2.0 has no instant push storm, so the push failure rate is zero, and the instability of 1.X UDP push A very small number of pushes have timed out and need to be retryed.

Performance improvement of Nacos2.0 configuration management

Since the configuration is a scenario where write less and read more, the bottleneck is mainly in the number of clients monitored by a single monitor and the push acquisition of the configuration. Therefore, the pressure measurement performance of configuration management mainly focuses on the connection capacity of a single server and the comparison of a large number of pushes.

  • Nacos2.0 connection capacity test

This scenario mainly focuses on the system pressure under different client scales.

Nacos2.0 can support 4.2w configuration client connections at the highest single machine. During the connection establishment stage, there are a large number of subscription requests that need to be processed, so the CPU consumption is high, but after reaching a steady state, the CPU consumption will become very low. Almost no consumption.

In contrast to Nacos 1.X, when the client is 6000, the steady state CPU is always high and the GC is frequent. The main reason is that the long round training is to maintain the connection through the hold request. It needs to return a Response every 30s and re-initiate the connection and request. Need to do a lot of context switching, but also need to hold all Request and Response. When the scale reaches 1.2w clients, it has been unable to reach a steady state, so it cannot support the number of clients of this magnitude.

  • Nacos2.0 frequent push test

This scenario focuses on system performance under different push scales.

In a frequently changing scenario, both versions are in 6000 client connections. Obviously, it can be found that the performance loss of version 2.0 is much lower than that of version 1.X. In the 3000tps push scenario, the optimization level is approximately 3 times optimized.

Nacos2.0 performance conclusion

For service discovery scenarios, Nacos2.0 can run stably at a scale of 10W; compared to the 1.2W scale of the Nacos1.X version, an increase of about 10 times.
For configuration management scenarios, Nacos2.0 single machine can support up to 4.2W client connections; compared to Nacos1.X, an increase of 7 times. And the performance when pushing is significantly better than 1.X.

Nacos ecology and 2.X follow-up planning

With the three-year development of Nacos, it has supported almost all RPC frameworks and microservice ecology, and led the development of cloud native microservice ecology.

Nacos is a very core component in the entire microservice ecosystem. It can seamlessly communicate with the K8s service discovery system, communicate with Istio through the MCP/XDS protocol, and deliver the Nacos service to Sidecar; it can also be combined with CoreDNS to pass the Nacos service through The domain name pattern is exposed to downstream calls.

Nacos has been integrated with various microservice RPC frameworks for service discovery; in addition, it can assist the high-availability framework Sentinel in controlling and issuing various management rules.

If you only use the RPC framework, sometimes it is not simple enough, because some RPC frameworks, such as gRPC and Thrift, also need to start the Server by itself and tell the client which IP to call. At this time, it needs to be integrated with application frameworks, such as SCA, Dapr, etc.; of course, Envoy Sidecar can also be used for traffic control, and the RPC at the application layer does not need to know the IP list of the service.

Finally, Nacos can also communicate with various microservice gateways to realize the distribution of the access layer and the invocation of microservices.

The practice of Nacos ecology in Alibaba

At present, Nacos has completed the construction of the trinity of self-research, open source, and commercialization. Alibaba’s internal business domains such as DingTalk, Koala, Ele.me, and Youku have all adopted the Nacos service in the cloud product MSE, and are compatible with Alibaba and cloud native Seamless integration of the technology stack. Let's briefly introduce Dingding as an example.

Nacos runs on the microservice engine MSE (Fully Managed Nacos Cluster) for maintenance and multi-cluster management; various Dubbo3 or HSF services of the business are registered to the Nacos cluster through Dubbo3 itself when they are started; then Nacos uses the MCP protocol to The service information is synchronized to the Istio and Ingress-Envoy gateways.

User traffic enters the group's VPC network from the north direction, first through a unified access to the Ingress-Tengine gateway, he can resolve and route the domain name to different computer rooms, units, etc. This week we also updated the Tengine 2.3.3 version synchronously , the kernel was upgraded to Nginx Core 1.18.0, it supports the Dubbo protocol, supports DTLSv1 and DTLSv1.2, and supports the Prometheus format, thereby improving the ecological integrity, security, and security of Alibaba Cloud microservices. Observability.

After passing through the unified access layer gateway, the user request will be forwarded to the corresponding microservice through the Ingress-Envoy microservice gateway, and called. If you need to call services from other network domains, the traffic will be imported to the corresponding VPC network through the Ingress-Envoy microservice gateway, so as to open up services in different security domains, network domains, and business domains.

The mutual calls between microservices will be carried out through Envoy Sidecar or traditional microservice self-subscription. Finally, the user request is completed and returned to the user in the mutual invocation of each microservice.

Nacos 2.X planning

Nacos 2.X will implement new functions through plug-ins and transform a large number of old functions on the basis of solving performance problems in 2.0, making Nacos more convenient and easier to expand.

to sum up


As a cross-generation version, Nacos2.0 completely solves the performance problems of Nacos1.X and improves the performance by 10 times. And through abstraction and layering to make the architecture simpler, and better extension through plug-in, Nacos can support more scenarios and integrate a broader ecology. It is believed that Nacos 2.X will be easier to use after the subsequent iterations, solve more microservice problems, and explore more in-depth towards Mesh.

join us


Everyone is welcome to submit issues and PRs on Nacos Github for discussion and contribution, or join the Nacos community group to participate in community discussions. Also take this opportunity to thank the 200+ friends who participated in Nacos contribution! Thank you for your promotion of China's open source business!

In addition to participation in open source, we also welcome more ability and willingness of students to join Ali cloud cloud build native details, please click on the post link .

Nacos Enterprise Edition 1 Yuan Package Event Limited Time Promotion... Click to view

Guess you like

Origin blog.csdn.net/weixin_39860915/article/details/115249855
Recommended