NetEase side: How to AP Eureka? Nacos has both CP and AP, how to achieve it?

Said it in front

In the reader exchange group (50+) of 40-year-old architect Nien , some friends have recently obtained first-tier Internet companies such as NetEase, Weibo, Alibaba, Autohome, Jitu, Youzan, Xiyin, Baidu, Didi's interview qualifications encountered a few very important interview questions:

  • Is Eureka AP or CP? Talk about its cluster data consistency process?
  • Is Nacos AP or CP? Talk about its cluster data consistency process?

Similar problems that other friends have encountered include:

  • How does Eureka AP? Nacos has both CP and AP, how to achieve it?

Therefore, here Nien will give you a systematic and systematic review, so that you can fully demonstrate your strong "technical muscles" and make the interviewer "can't help himself and drool" .

This question and reference answers are also included in the V112 version of our " Nien Java Interview Guide PDF " for reference by subsequent friends to improve everyone's 3-level architecture, design, and development levels.

For the PDFs of "Nien Architecture Notes", "Nien High Concurrency Trilogy" and "Nien Java Interview Guide", please go to the official account [Technical Freedom Circle] to obtain

Data consistency issues in the registration center cluster

The service registration center must be highly available, which means that it cannot be a single point, but must be a registration center cluster.

The next question is:

In a microservice registration center cluster, how to ensure that the registration information or metadata information of the microservice Provider is consistent?

First, let’s review the CAP theorem.

CAP theorem

There is an important theory in distributed systems: CAP.

  1. C: Consistency
    In a distributed system, data will exist in multiple copies. Some problems may cause some copies to succeed and some copies to fail when writing data, resulting in data inconsistency. The requirement of consistency C is that after the data update operation is successful, the data of multiple copies must be consistent.
  2. A: Availability (Availability)
    Whenever the client performs read and write operations on the cluster, the request should receive a normal response.
  3. P: Partition Tolerance (Partition Tolerance)
    When a communication failure occurs and the cluster is divided into multiple partitions that cannot communicate, the cluster should still be able to operate normally.

Is the microservice registration center AP or CP?

Back to the scene of the microservice registration center.

There are many middlewares in the microservice registration center, such as the traditional distributed coordination component Zookeeper, the traditional microservice registration center Eureka, Alibaba's microservice registration center Nacos, and Google's distributed coordination component etcd, etc.

Is the microservice registration center AP or CP?

The first thing to make clear is that Eureka is an AP high-concurrency type, not a CP strong consistency type, but weak data consistency.

ZooKeeper is a CP type registration center, which is to ensure strong data consistency as much as possible. ZooKeeper first sacrifices A. In addition, in some cases, it can sacrifice availability P.

Therefore, Eureka and ZooKeeper are completely two extremes.

Eureka chose A, and ZooKeeper gave priority to C.

Eureka has high availability. Service consumers can obtain service lists normally at any time, but strong consistency of data is not guaranteed. Consumers may obtain expired service lists.

Nacos is compatible and can support both AP mode and CP mode.

Spring Cloud Alibaba Nacos officially supports AP and CP consistency protocols in 1.0.0. Among them, the implementation of CP consistency protocol is a strong consistency implementation based on the simplified Raft protocol.

Eureka's data synchronization method

How to copy between multiple copies

First, let’s look at the way data is synchronized, or the way multiple copies are replicated.
Usually, the way data in a distributed system is replicated among multiple copies can be roughly divided into the following two types:

  • master-slave replication

This is the Master-Slave mode. There is a master copy and the other slave copies. All write operations will be submitted to the master copy, and then the master copy will update to other slave copies.

Therefore, the write pressure will be concentrated on the master copy, which becomes the bottleneck of the system, while the slave copy can share the read requests.

  • Peer to peer replication

This is the Peer to Peer mode. There is no master-slave distinction between replicas. Any replica can receive write operations, and then each replica will update data to each other.

Advantages of Peer to Peer peer replication mode:

Any replica can receive write requests, and there is no bottleneck of write pressure. However, data conflicts may occur during data synchronization between replicas.

Eureka adopts the Peer to Peer model.

Eureka's Peer to Peer mode synchronization process

After Eureka Server starts, it will use the local Eureka Client to initiate a request to one of the other Eureka Server nodes to obtain registered service information and copy this information to other peer nodes.

Whenever Eureka Server's own information changes, such as when a microservice client initiates a registration, renewal or logout request to it, it will push the latest information to other Eureka Servers to maintain data synchronization.

Circular copy problem

Of course, there's a problem here: the circular replication problem.

Specifically, if its own information changes are synchronized by another Eureka Server, then if the information is synchronized back, an infinite loop of data synchronization will occur.

When Eureka Server performs a replication operation, it uses an http header named HEADER_REPLICATION to distinguish the replication operation.

If a request carries the HEADER_REPLICATION header, then the request is no longer a normal request from the client of the ordinary application instance microservice, but a copy request from other servers. In this way, when Eureka Server receives a copy request, it will no longer perform the copy operation, thus avoiding an infinite loop.

Another problem is data conflicts.

For example, server A initiates a synchronization request to server B. If A's data is older than B's, then B cannot accept A's data. In this case, how does B know that A's data is old? What should A do at this time?

The newness of data is usually defined by a version number. Eureka uses lastDirtyTimestamp, an attribute similar to a version number, to implement this.

lastDirtyTimestamp is an attribute of the service instance in the registry, which indicates the last change time of this service instance.

Replication between nodes may go wrong. How to detect and compensate for errors?

In addition, there is an important mechanism in the Eureka cluster: hearbeat heartbeat, which is the renewal operation, which is used to complete the final repair of the data. Since errors may occur during replication between nodes, we can discover and fix these errors through the heart beat mechanism.

To summarize, Eureka’s data synchronization method

  • Eureka uses the Peer to Peer mode for data replication.
  • Eureka solves the problem of circular replication through the http header which is HEADER_REPLICATION.
  • Eureka resolves replication conflicts through lastDirtyTimestamp.
  • Eureka implements data repair through the heartbeat mechanism.

Nacos satisfies both AP and CP

Unlike Eureka and Zookeeper clusters, Nacos can support both AP and CP.

Nacos supports CP+AP mode, which means Nacos can be recognized as CP mode or AP mode depending on the configuration, and is AP mode by default.

  • If the Nacos client node is registered with ephemeral=true, then the effect of the Nacos cluster on the client node is AP, which is implemented using the distro protocol;

  • When registering the client node of Nacos, ephemeral=false is set, then the effect of the Nacos cluster on this node is CP, which is implemented using the raft protocol.

According to the attributes when the client is registered, AP and CP coexist at the same time, but they have different effects on different client nodes.

Therefore, Nacos can well meet the business needs of different scenarios.

Quickly understand the Distro protocol

The Distro protocol is an AP distributed protocol independently developed by Nacos. It is specially designed for temporary instances to ensure that when some Nacos nodes go down, the entire temporary instance can still run normally.

As a built-in protocol for a stateful middleware application, Distro ensures unified coordination and storage of each Nacos node when processing a large number of registration requests.

The synchronization process of Distro protocol and Eureka Peer to Peer mode is roughly similar.

The synchronization process of the Distro protocol is roughly as follows:

  • Each node is equally capable of handling write requests and simultaneously synchronizing new data to other nodes.
  • Each node is only responsible for part of the data, and regularly sends the check value of the data it is responsible for to other nodes to maintain data consistency.
  • Each node handles read requests independently and responds locally in a timely manner.

The next few sections will introduce how the Distro protocol works through different scenarios.

Distro node newly added to the cluster scenario

The newly added Distro node will pull all data.

The specific operation is to visit all Distro nodes in sequence and pull the full amount of data by sending requests to other machines.

After completing the full pull operation, each Nacos machine maintains all currently registered non-persistent instance data.

heartbeat scene

After the Distro cluster is started, heartbeats will be sent regularly between each machine.

Heartbeat information mainly includes meta-information of all data on each machine (meta-information is used to ensure that the amount of data transmission in the network is maintained at a low level). This data verification is performed in the form of heartbeat, that is, each machine initiates a data verification request to other machines at a fixed time interval.

If during the data verification process, a machine finds that the data on other machines is inconsistent with local data, it will initiate a full pull request to complete the data.

Write operation scenario

For a Distro cluster that has been started, in the process of a client initiating a write operation, when a write request to register a non-persistent instance hits a Nacos server, the flow chart of the Distro cluster processing is as follows.

The entire step includes several parts (from top to bottom in the figure):

  • The preceding Filter intercepts the request, calculates the Distro responsible node to which it belongs based on the IP and port information contained in the request, and forwards the request to the Distro responsible node to which it belongs.
  • The Controller on the responsible node parses the write request.
  • The Distro protocol regularly performs Sync tasks to synchronize all instance information that the local machine is responsible for to other nodes.

Read operation scenario

Since the full amount of data is stored on each machine, in each read operation, the Distro machine will directly obtain the data locally to achieve fast response.

This mechanism ensures that the Distro protocol can serve as the AP protocol and respond promptly to read operations.

  • Under network partition conditions, all read operations can still return results normally;
  • When the network recovers, each Distro node will merge and restore each data fragment.

To sum up, Distro’s data synchronization

The Distro protocol is a consistent protocol developed by Nacos for temporary instance data .

The data is stored in the cache, and full data synchronization is performed at startup, and data verification is performed regularly .

Following the design concept of the Distro protocol, each Distro node can receive read and write requests. The request scenarios of the Distro protocol are mainly divided into the following three situations:

  1. When the node receives a write request belonging to the instance for which the node is responsible, it writes directly.
  2. When the node receives a write request that does not belong to the instance that the node is responsible for, it will be routed within the cluster and forwarded to the corresponding node to complete the read and write.
  3. When the node receives any read request, it is queried directly on the local machine and returned (because all instances are synchronized to each machine).

As Nacos' built-in temporary instance consistency protocol, the Distro protocol ensures that in a distributed environment, the service information status on each node can be notified to other nodes in a timely manner, supporting the storage and consistency maintenance of hundreds of thousands of service instances.

Quickly understand the Raft protocol

Spring Cloud Alibaba Nacos officially supports AP and CP consistency protocols in 1.0.0. The CP consistency protocol implementation is based on simplified Raft's CP consistency.

Raft is suitable for a protocol that manages log consistency. Compared with the Paxos protocol, Raft is easier to understand and implement.

In order to improve understanding, Raft divides the consistency algorithm into several parts, including leader selection, log replication, and safety, and uses stronger consistency to reduce the number of considerations that must be made. status.

Compared with Paxos, the Raft algorithm is more intuitive to understand.

The Raft algorithm divides the Server into three states, or it can also be called roles:

  • Leader : Responsible for Client interaction and log replication. There can be at most 1 leader in the system at the same time.
  • Follower : passively responds to request RPC and never actively initiates request RPC.
  • Candidate : A temporary role that only exists during the leader election phase. If a node wants to become the leader, it initiates a voting request and becomes a candidate. If the election is successful, it becomes candidate, otherwise it returns to follower.

The flow of status or roles is as follows:

In Raft, the problem is broken down into: leader selection, log replication, security, and member changes.

Replication of the state machine is achieved by replicating the log:

Log: Each machine saves a log. The log comes from the client's request and contains a series of commands.

State Machine: The state machine executes these commands in sequence.

Consistency model: In a distributed environment, ensure that the logs of multiple machines are consistent, so that the state during state machine playback is consistent.

Raft algorithm main selection process

Raft uses a heartbeat mechanism to initiate leader election. When the server starts, the server becomes a follower. The follower will remain in follower status as long as it receives valid RPCs from the leader or candidate. If the follower does not receive a message within a period of time (this period is called the election timeout), it will assume that there is currently no available leader, and then start the process of electing a new leader.

1.Term

The concept of Term is analogous to the change of dynasties in Chinese history. The Raft algorithm divides time into terms of any length.

Tenure is expressed as a consecutive number. Each term begins with an election, in which one or more candidates attempt to become leader. If a candidate wins the election, he serves as leader for the remainder of the term. In some cases, votes may be split, resulting in no leader being elected, at which point a new term will begin and the next election will immediately take place. The Raft algorithm ensures that there is only one leader in a given term.

2.RPC

Communication between server nodes in the Raft algorithm uses remote procedure calls (RPCs), and the basic consistency algorithm only requires two types of RPCs. A third RPC is added to transfer snapshots between servers.

There are three types of RPC:

  • RequestVote RPC : initiated by a candidate during an election
  • AppendEntries RPC : a heartbeat mechanism initiated by the leader, and copying the log is also completed in this command
  • InstallSnapshot RPC : The leader uses this RPC to send snapshots to followers that are too far behind.
3. Election process

(1) follower adds the current term and changes it to candidate.

(2) The candidate votes for itself and sends RequestVote RPC to other servers in the cluster.

(3) The server that receives RequestVote will only vote for at most one candidate on a first-come, first-served basis in the same term. And it will only vote for candidates whose log is at least as new as itself.

initial node

Node1 turns to Candidate to initiate election

Node confirms election

Node1 becomes leader and sends Heartbeat

The candidate node remains in the state of (2) until one of the following three situations occurs.

  • The node wins the election, that is, receives votes from the majority of nodes, and then transitions to the leader state.
  • Another server becomes the leader, that is, it receives a legitimate heartbeat packet (the term value is greater than or equal to the current own term value), and then changes to the follower state.
  • After a period of time when the winner is still not determined, a new round of elections will start.

In order to solve the problem of being unable to determine the leader when the number of votes is the same, Raft uses a random election timeout.

4. Log replication

The main purpose of log replication (Log Replication) is to ensure the consistency of the node. The operations performed at this stage are all to ensure consistency and high availability.

After the Leader is elected, it becomes responsible for processing client requests. All transaction (update operation) requests must be processed by the Leader first. Log replication (Log Replication) is what is done to ensure that the same sequence of operations is performed.

In Raft, after receiving the client's log (transaction request), the log is first appended to the local Log, and then the Entry is synchronized to other Followers through heartbeat. After receiving the log, the Follower records the log and then sends an ACK to the Leader. , when the Leader receives the ACK information from most (n/2+1) Followers, it sets the log as submitted and appends it to the local disk, notifies the client, and in the next heartbeat, the Leader will notify all Followers to submit the log. Stored in its own local disk.

How to implement the Raft algorithm

When the Nacos server starts, it calls the RaftCore.init() method through the RunningConfig.onApplicationEvent() method.

start election
public static void init() throws Exception {
    
    
 
    Loggers.RAFT.info("initializing Raft sub-system");
 
    // 启动Notifier,轮询Datums,通知RaftListener
    executor.submit(notifier);
     
    // 获取Raft集群节点,更新到PeerSet中
    peers.add(NamingProxy.getServers());
 
    long start = System.currentTimeMillis();
 
    // 从磁盘加载Datum和term数据进行数据恢复
    RaftStore.load();
 
    Loggers.RAFT.info("cache loaded, peer count: {}, datum count: {}, current term: {}",
        peers.size(), datums.size(), peers.getTerm());
 
    while (true) {
    
    
        if (notifier.tasks.size() <= 0) {
    
    
            break;
        }
        Thread.sleep(1000L);
        System.out.println(notifier.tasks.size());
    }
 
    Loggers.RAFT.info("finish to load data from disk, cost: {} ms.", (System.currentTimeMillis() - start));
 
    GlobalExecutor.register(new MasterElection()); // Leader选举
    GlobalExecutor.register1(new HeartBeat()); // Raft心跳
    GlobalExecutor.register(new AddressServerUpdater(), GlobalExecutor.ADDRESS_SERVER_UPDATE_INTERVAL_MS);
 
    if (peers.size() > 0) {
    
    
        if (lock.tryLock(INIT_LOCK_TIME_SECONDS, TimeUnit.SECONDS)) {
    
    
            initialized = true;
            lock.unlock();
        }
    } else {
    
    
        throw new Exception("peers is empty.");
    }
 
    Loggers.RAFT.info("timer started: leader timeout ms: {}, heart-beat timeout ms: {}",
        GlobalExecutor.LEADER_TIMEOUT_MS, GlobalExecutor.HEARTBEAT_INTERVAL_MS);
}

The init method mainly does the following things:

  1. Get Raft cluster node peers.add(NamingProxy.getServers());
  2. Raft cluster data recovery RaftStore.load();
  3. Raft选举 GlobalExecutor.register(new MasterElection());
  4. Raft心跳 GlobalExecutor.register(new HeartBeat());
  5. Raft releases content
  6. Raft ensures content consistency
election process

Among them, the internal nodes of the raft cluster are exposed through the Restful interface, and the code is in RaftController. The RaftController controller is used for communication between nodes within the raft cluster. The specific information is as follows:

POST HTTP://{
    
    ip:port}/v1/ns/raft/vote : 进行投票请求

POST HTTP://{
    
    ip:port}/v1/ns/raft/beat : LeaderFollower发送心跳信息

GET HTTP://{
    
    ip:port}/v1/ns/raft/peer : 获取该节点的RaftPeer信息

PUT HTTP://{
    
    ip:port}/v1/ns/raft/datum/reload : 重新加载某日志信息

POST HTTP://{
    
    ip:port}/v1/ns/raft/datum : Leader接收传来的数据并存入

DELETE HTTP://{
    
    ip:port}/v1/ns/raft/datum : Leader接收传来的数据删除操作

GET HTTP://{
    
    ip:port}/v1/ns/raft/datum : 获取该节点存储的数据信息

GET HTTP://{
    
    ip:port}/v1/ns/raft/state : 获取该节点的状态信息{
    
    UP or DOWN}

POST HTTP://{
    
    ip:port}/v1/ns/raft/datum/commit : Follower节点接收Leader传来得到数据存入操作

DELETE HTTP://{
    
    ip:port}/v1/ns/raft/datum : Follower节点接收Leader传来的数据删除操作

GET HTTP://{
    
    ip:port}/v1/ns/raft/leader : 获取当前集群的Leader节点信息

GET HTTP://{
    
    ip:port}/v1/ns/raft/listeners : 获取当前Raft集群的所有事件监听者
RaftPeerSet
heartbeat mechanism

Raft uses a heartbeat mechanism to trigger leader election.

The heartbeat timing task is registered in GlobalExecutor through GlobalExecutor.register(new HeartBeat()). The specific operations include:

  • Reset the heart timeout and election timeout of the Leader node;
  • sendBeat() sends heartbeat packet
public class HeartBeat implements Runnable {
    
    
    @Override
    public void run() {
    
    
        try {
    
    

            if (!peers.isReady()) {
    
    
                return;
            }

            RaftPeer local = peers.local();
            local.heartbeatDueMs -= GlobalExecutor.TICK_PERIOD_MS;
            if (local.heartbeatDueMs > 0) {
    
    
                return;
            }

            local.resetHeartbeatDue();

            sendBeat();
        } catch (Exception e) {
    
    
            Loggers.RAFT.warn("[RAFT] error while sending beat {}", e);
        }
    }
}

This article briefly explains the implementation of Raft consistency in Nacos. For a more detailed process, you can download the source code and view RaftCore to learn more.

The source code can be checked out at the following address:

git clone https://github.com/alibaba/nacos.git

Say it at the end

Interview questions related to data consistency in the registration center are very common interview questions.

If everyone can answer the above content fluently and thoroughly, the interviewer will basically be shocked and attracted by you.

Before the interview, it is recommended that you systematically review the 5,000-page " Nien Java Interview Guide PDF ". If you have any questions during the question review process, you can come to talk to Nien, a 40-year-old architect.

In the end, the interviewer loved it so much that he "can't help himself and his mouth watered" . The offer is coming.

Recommended reading

" Ten billions of visits, how to design a cache architecture "

" Multi-level cache architecture design "

" Message Push Architecture Design "

" Alibaba 2: How many nodes do you deploy?" How to deploy 1000W concurrency?

" Meituan 2 Sides: Five Nines High Availability 99.999%. How to achieve it?"

" NetEase side: Single node 2000Wtps, how does Kafka do it?"

" Byte Side: What is the relationship between transaction compensation and transaction retry?"

" NetEase side: 25Wqps high throughput writing Mysql, 100W data is written in 4 seconds, how to achieve it?"

" How to structure billion-level short videos? "

" Blow up, rely on "bragging" to get through JD.com, monthly salary 40K "

" It's so fierce, I rely on "bragging" to get through SF Express, and my monthly salary is 30K "

" It exploded...Jingdong asked for 40 questions on one side, and after passing it, it was 500,000+ "

" I'm so tired of asking questions... Ali asked 27 questions while asking for his life, and after passing it, it's 600,000+ "

" After 3 hours of crazy asking on Baidu, I got an offer from a big company. This guy is so cruel!"

" Ele.me is too cruel: Face an advanced Java, how hard and cruel work it is "

" After an hour of crazy asking by Byte, the guy got the offer, it's so cruel!"

" Accept Didi Offer: From three experiences as a young man, see what you need to learn?"

"Nien Architecture Notes", "Nien High Concurrency Trilogy", "Nien Java Interview Guide" PDF, please go to the following official account [Technical Freedom Circle] to get ↓↓↓

Guess you like

Origin blog.csdn.net/crazymakercircle/article/details/133351434