The design principle of the Nacos registration center: make it easy for your application to achieve efficient registration and discovery!

Service discovery was born when applications started to run and be accessed outside of a single machine. The current network architecture is that each host has an independent IP address, and service discovery basically obtains the IP address deployed by the service in some way.

The DNS protocol is the earliest protocol to translate a network name into a network IP. In the initial architecture selection, DNS+LVS+Nginx basically satisfies the discovery of all RESTful services. At this time, the service IP list is usually configured in nginx or LVS. Later, RPC services appeared, and the online and offline services became more frequent. People began to seek a registration center product that could support dynamic online and offline and push IP list changes.

The Internet software industry generally favors open source products, because the code of open source products is transparent, you can participate in co-construction, there is a community for communication and learning, and of course, more importantly, it is free. Individual developers or small and medium-sized companies often choose open source products as their first choice.

1 Open source products

1.1 Zookeeper

The classic service registry product (although its original positioning is not here), for a long time, is the only choice that Chinese people think of when they mention the RPC service registry, which is largely the same as Dubbo in China of popularity.

1.2 Consul and Eureka

Both appeared in 2014:

  • Consul is designed to include many functions needed in distributed service governance, and can support service registration, health check, configuration management, Service Mesh, etc.
  • Eureka has become popular with the concept of microservices, and its deep integration with SpringCloud ecology has also acquired a large number of users

1.3 Nacos

Carrying Alibaba's large-scale service production experience, it tries to provide users with a new choice in the market of service registration and configuration management.

Figure 1 Service Discovery:

1.4 Advantages of open source products

Developers can read the source code to understand the functional design and architectural design of the product, and at the same time test the performance through local deployment, followed by comparison articles of various products.

However, the current comparison of registration centers often stays in the comparison of superficial functions, without in-depth discussion on architecture or performance.

1.5 Pain points

It is the service registry that is often hidden behind the service framework as a silently supported product. An excellent service framework often supports multiple configuration centers, but the selection of the registration center is still strongly related to the service framework. A common situation is that a service framework will have a default service registration center. Although this saves users from the trouble of model selection, the limitations of a single registration center cause users to deploy multiple sets of completely different registration centers when using multiple service frameworks. The data collaboration between these registration centers is also a question.

This article deeply introduces the design principles of the Nacos registry from various angles, and tries to summarize and explain the main points that should be followed and considered in the product design of the service registry from our experience and research.

2 Data Model

The core data of the registration center:

  • service name
  • its corresponding network address

When a service registers multiple instances, we need to filter unhealthy instances or distribute traffic based on some characteristics of the instances, so we need to store some attributes such as health status and weight in the instances. As the scale of the service expands, it is gradually necessary to set some permission rules at the entire service level and some switches that are effective for all instances, so some attributes will be set at the service level. Later, we found that a single service instance would need to be divided into multiple subsets. For example, if a service is deployed in multiple computer rooms, it may be necessary to configure different instances of each computer room. Another data level is set between the service and the instance.

Compared

  • Zookeeper does not design a data model for service discovery. Its data is organized in a more abstract tree KV, so theoretically any semantic data can be stored
  • Eureka or Consul both achieve instance-level data expansion, which can meet most scenarios, but cannot meet large-scale and multi-environment service data storage
  • The data model extracted by Nacos after years of internal production experience is a three-layer model of service-cluster-instance. It basically satisfies the data storage and management of services in all scenarios.

Although the data model of Nacos is relatively complex, it does not force you to use all the data in it. In most scenarios, you can choose to ignore these data attributes. At this time, it can be reduced to the same data model as Eureka and Consul.

Data Segregation Model

As a shared service component, it needs to be able to ensure data isolation and security when used by multiple users or business parties, which is very common in slightly larger business scenarios. On the other hand, the service registry often supports deployment on the cloud. At this time, the data model of the service registry is required to be able to adapt to the general model on the cloud.

Zookeeper, Consul, and Eureka do not have a clear model for service isolation at the open source level. Nacos has considered from the very beginning how to enable users to isolate data in multiple dimensions and at the same time smoothly migrate to the corresponding service on Alibaba Cloud. commercial product.

Figure 3 The four-layer data logic isolation model of the service:

The user account may correspond to an enterprise or an independent individual. Generally, this data will not be transparently transmitted to the service registration center. A user account can create multiple namespaces, and each namespace corresponds to a client instance. The physical cluster of the registry corresponding to this namespace can be routed according to the rules, so that the internal upgrade and migration of the registry can be controlled. Users are not aware of it, and at the same time, according to the level of users, physical clusters with different service levels are provided for users.

Further down is a two-dimensional service identifier composed of service grouping and service name, which can satisfy service isolation at the interface level.

Another new feature introduced by Nacos 1.0.0 is: temporary instance and persistent instance. The key to the definitional distinction between ephemeral and persistent instances is the way health checks are performed. Temporary instances use client-side reporting mode, while persistent instances use server-side reverse detection mode. Temporary instances need to be able to automatically remove unhealthy instances without persistent storage instances, so such instances are suitable for Gossip-like protocols. The persistent instance on the right uses the health check method of server-side detection, because the client will not report the heartbeat, so it is naturally impossible to automatically remove the offline instance.

Figure 4 Temporary instance and persistent instance:

For medium and large companies, both types of services are available:

  • Some basic components, such as databases, caches, etc., often cannot report heartbeats. When registering this type of service, it needs to be registered as a persistent instance
  • For upper-level business services, such as microservices or Dubbo services, the Provider side of the service supports adding logic for reporting heartbeats, and dynamic service registration methods can be used

Nacos 2.0 uses the persistent and non-persistent settings, but with adjustments. Persistent and non-persistent attributes in Nacos 1.0 are stored and identified as metadata of an instance. As a result, both persistent instances and non-persistent instances can exist under the same service. But in actual use, this mode:

  • It will bring great confusion and complexity to operation and maintenance personnel
  • From the perspective of system architecture, there are certain contradictions in the scenario where a service has both persistent and non-persistent instances

As a result, this ability is not in fact widely used. In order to simplify the service data model of Nacos, reduce the complexity of operation and maintenance, and improve the usability of Nacos, in Nacos2.0:

  • Whether the persistent data is abstracted to the service level
  • It is no longer allowed for a service to have persistent instances and non-persistent instances at the same time, and the persistent attributes of the instance inherit from the persistent attributes of the service

3 Data Consistency

Distributed systems are an eternal topic. From the perspective of the protocol level, the selection of consistency has not been joined by new members for a long time. At present, the basic

Can belong to two:

  • Single-point write consistency of Leader-based non-peer deployment
  • Multi-write consistency for peer-to-peer deployments

When choosing a service registration center, no protocol can cover all scenarios, such as:

  • When the registered service node does not regularly send heartbeats to the registration center, the strong consensus protocol seems to be the only option, because data compensation registration cannot be performed through heartbeats, and the first registration must ensure that data will not be lost
  • And when the client sends heartbeats regularly to report the health status, the success rate of the first registration is not very critical (of course it is also critical, but relatively speaking, we tolerate a small amount of data writing failure), because the follow-up can still pass If the heartbeat compensates the data, the single-point bottleneck of the Paxos protocol will not be cost-effective. This is why Eureka does not use the Paxos protocol but uses a custom Renew mechanism

These two data consistency protocols have their own usage scenarios, and the different requirements for service registration will lead to the use of different protocols. The behavior of Zookeeper under the Dubbo system is actually more appropriate to use Eureka's Renew mechanism, because the Dubbo service registers with Zookeeper as a temporary node, and it needs to regularly send heartbeats to Zookeeper to renew the node, and when the service is offline, the Zookeeper Remove the corresponding nodes. Although Zookeeper uses ZAB to ensure strong consistency of data, its computer room lacks disaster recovery capabilities and cannot adapt to some large-scale scenarios.

Nacos 1.0.0 officially supports the coexistence of two consistency protocols, AP and CP, because it needs to support the registration of multiple service types, and has essential capabilities such as computer room disaster recovery and cluster expansion. 1.0.0 Restructured the read-write and synchronization logic of data, and separated the business-related CRUD from the underlying consistency synchronization logic. Then abstract the reading and writing of the business (mainly writing, because the reading will directly use the cache of the business layer) into the data type defined by Nacos, and call the consistency service for data synchronization. When deciding whether to use CP or AP consistency, use a proxy to forward through controllable rules.

The current consensus protocol implementations are CP consistency based on simplified Raft, and AP consistency based on the self-developed protocol Distro. Needless to say, the Raft protocol is written based on the Leader, and its CP is not strict, but it can guarantee half of the seen consistency, and the probability of data loss is small. The Distro protocol refers to the internal ConfigServer and the open source Eureka, and achieves basically the same without the use of third-party storage. The focus of Distro is to do some logic optimization and performance tuning

Figure 5 Nacos Consistency Protocol:

This article is published by OpenWrite, a multi-post platform for blogging !

Guess you like

Origin blog.csdn.net/qq_33589510/article/details/132656852