Nacos Architecture and Principles - CAP Consistency Protocol (Raft & Distro)

insert image description here


Why Nacos needs a consensus protocol

Nacos reduces user deployment and operation and maintenance costs as much as possible, so that users only need one package to quickly start Nacos in stand-alone mode or in cluster mode .

And Nacos is a component that needs to store data. Therefore, in order to achieve this goal, it is necessary to implement data storage inside Nacos.

  • In fact, it is not a big problem under a single machine, a simple embedded relational database is enough;
  • However, in the cluster mode, it is necessary to consider how to ensure the data consistency and data synchronization between each node. To solve this problem, a consensus algorithm has to be introduced to ensure the data consistency between each node. .

insert image description here


Why Nacos chose Raft and Distro

Why does Nacos run both CP protocol and AP protocol in a single cluster? This actually starts from the Nacos scenario: Nacos is a component that integrates service registration discovery and configuration management. Therefore, for the data consistency guarantee problem between each node under the cluster, it needs to be split into two aspect

From service registry discovery

The instance information that services perceive each other’s services and can provide services normally must be obtained from the service discovery registration center. Therefore, high requirements are put forward for the availability of the service registration discovery center components. It is possible to ensure that the service registration discovery capability can provide external services ;

At the same time, the service registration discovery design of Nacos adopts the mechanism that the heartbeat can automatically complete the service data compensation . If data is lost, it is possible to quickly make up for data loss through this mechanism.

Therefore, in order to meet the availability of the service discovery registry, the strong consistency consensus algorithm is not suitable here , because there are requirements for the strong consistency consensus algorithm to provide external services. If the number of nodes available in the current cluster does not If it is more than half, the entire algorithm will directly "strike", and if the consensus algorithm is finally agreed, the availability of services will be more guaranteed, and it will be able to ensure that the data among the nodes can reach a consensus within a certain period of time.

The above are all for the non-persistent service in the Nacos service discovery registration (that is, the client needs to report the heartbeat to renew the service instance) .

For the persistent service in the Nacos service discovery registration, because all the data is directly created by calling the Nacos server, Nacos needs to ensure the strong consistency of the data between each node, so for this type of service Data, choose a strong consistency consensus algorithm to ensure data consistency

insert image description here


From the point of view of configuration management

The configuration data is created and managed directly on the Nacos server. It must be ensured that most of the nodes have saved the configuration data to consider that the configuration has been successfully saved, otherwise the configuration changes will be lost. If this happens , The problem is very serious. If an important configuration change is released and the change action is lost, it will probably cause a serious live network failure. Therefore, for the management of configuration data, it is necessary to require most of the nodes in the cluster to be strong Consistent, and here we can only use a strong consensus consensus algorithm


Why Raft and Distro?

Raft (CP mode)

For the strong consensus consensus algorithm, the Raft protocol is most used in the current industrial production. The Raft protocol is easier to understand, and there are many mature industrial algorithm implementations, such as

  • Ant Financial's JRaft
  • Zookeeper's ZAB
  • Consul's Raft
  • Baidu's braft
  • Apache Rats

Because Nacos is a Java technology stack, it can only be selected among JRaft, ZAB, and Apache Ratis. However, because ZAB is strongly bound to Zookeeper and hopes to communicate with the support team of the Raft algorithm library, it chooses JRaft and chooses JRaft It is also because JRaft supports multiple RaftGroups, which brings the possibility of multiple data fragmentation behind Nacos.


Distro (AP mode)

The Distro protocol is an eventual consistency protocol self-developed by Alibaba , and there are many eventual consistency protocols, such as the data synchronization algorithm in Gossip and Eureka. The Distro algorithm is optimized by integrating the advantages of Gossip and Eureka protocols. For the original Gossip, since the nodes that send messages are randomly selected, it is inevitable that messages will be sent to the same node repeatedly, which increases the network The transmission pressure also brings additional processing load to message nodes, and the Distro algorithm introduces
the concept of authoritative server, each node is responsible for a part of data and synchronizes its own data to other nodes, effectively reducing message redundancy problem
.
insert image description here


Evolution of Nacos Consistency Protocol

Early Nacos consensus protocol

Look at the architecture of the early Naocs version
insert image description here

  • In the early Nacos architecture, the service registration and configuration management consistency protocols were separated, and there was no sinking into the Nacos kernel module as a general capability evolution

  • The implementation of the consistency protocol of the service discovery module is strongly coupled with the logic of the service registration discovery module, and is filled with some concepts of service registration discovery.

  • This makes the logic of Nacos's service registration discovery module complex and difficult to maintain, coupling the data status of the consistency protocol layer, making it difficult to completely separate computing and storage , and also has a certain impact on the unlimited horizontal expansion capability of the computing layer .

Therefore, in order to solve this problem, it is necessary to abstract and sink the consistency protocol of Nacos to make it a core module capability, so that the service registration discovery module can only serve as computing power, and at the same time lay a foundation for the configuration module to be stored in an external database. Architecture basics.

insert image description here


The current consistency protocol layer of Nacos

As mentioned above, in the current Nacos kernel, we have achieved the ability to fully sink the consistency protocol into the kernel module as the core capability of Nacos, which serves well for the service registration discovery module and configuration management module. Let's take a look at the current Nacos architecture.

insert image description here

It can be found that in the new Nacos architecture, the consistency protocol has been sinked from the original service registration discovery module to the kernel module, and a unified abstract interface has been provided as much as possible, so that the upper layer service registration discovery Modules and configuration management modules no longer need to be coupled with any consistency semantics. After decoupling the abstraction layer, each module can evolve rapidly, and the performance and usability are greatly improved.

How does Nacos achieve consistency protocol sinking

Now that Nacos has managed to sink the AP and CP protocols to the kernel module, and maintain the same user experience as much as possible. So how did Nacos achieve this consensus protocol sinking?

insert image description here

Consensus protocol abstraction

  • In fact, the consistency protocol is used to ensure data consistency, and the generation of data must have a writing action;

  • At the same time, it is also necessary to be able to read data, and to ensure the action of reading data and the data results obtained, and to be guaranteed by the consistency protocol.

Therefore, the two most basic methods of the consistency protocol are the write action and the read action.

insert image description here

insert image description here

  • Anyone who uses a consensus protocol only needs to use the getData and write methods.
  • At the same time, the consistency protocol has been abstracted in the consistency package. Nacos uses the abstraction of the consistency protocol interface of AP and CP in it, and when implementing the specific consistency protocol, pluggable plug-ins are used. In this way, the implementation logic of the consensus protocol is further decoupled from the two modules of service registration discovery and configuration management.

insert image description here

In fact, it is not enough to complete the abstraction of the consistency protocol. If we only do this, then service registration discovery and configuration management still need to rely on the interface of the consistency protocol, and the stateful interface is coupled between the two computing modules. ;

Moreover, although a relatively high level of consistency protocol abstraction has been made, the service module and configuration module still need to display the read and write request logic for processing the consistency protocol in their own code modules, and need to implement a It is actually not good to connect to the storage of the consistency protocol . Service discovery and configuration modules should focus more on the use and calculation of data, rather than how to store data, how to ensure data consistency, data storage and multi-node Consistency issues should be guaranteed by the storage layer .

In order to further reduce the frequency of the consistency protocol appearing in the two modules of service registration discovery and configuration management, and to make the consistency protocol only perceived in the kernel module as much as possible, Nacos has done another job here - data storage abstract.


data storage abstraction

The consistency protocol is used to ensure data consistency. If the consistency protocol is used to implement a storage, then the service module and configuration module will change from relying on the consistency protocol interface to relying on the storage interface.

The specific implementation behind the storage interface is much richer than the consistency protocol, and the service module and configuration module do not need to undertake redundant coding work (snapshot, state machine implementation, data synchronization) to directly rely on the consistency protocol. ). This allows these two modules to focus more on their core logic.

For data abstraction, here we only take the service registration discovery module as an example

insert image description here

  • Since the service module storage of Nacos mostly performs enumeration operations based on single or multiple unique keys, the Key-Value type storage interface is most suitable.

  • After the Key-Value storage interface is defined, it is actually the concrete implementation of this KVStore. The implementation of KVStore can be directly connected to Redis, or directly connected to DB, or directly based on the consistency protocol of the Nacos kernel module, on this basis, a memory or persistent distributed strong (weak) consistency can be realized KV.

  • The Nacos process is further separated into the computing logic layer and the storage logic layer through the functional boundary. The interaction between the computing layer and the storage layer is only through a thin layer of data operation glue code, so that the computing is realized in a single Nacos process . Complete separation of logic and storage

insert image description here

At the same time, for the storage layer, the plug-in design is further implemented. For small and medium-sized companies with operation and maintenance cost requirements, they can directly use the built-in distributed storage components of Nacos to deploy a set of Nacos clusters. If the service instance If the amount of data and configuration data is large, and there is a relatively good Paas layer service, then the existing storage components can be reused to completely separate the computing layer and storage layer of Nacos.

insert image description here

Guess you like

Origin blog.csdn.net/yangshangwei/article/details/131101178