Into the soul! Talk about the consistency of distributed systems!

Author | bdseeker

Source | Road to BigData (ID: bigdata3186)

Head picture | CSDN download from Visual China


Preface

After I wrote the last article "CAP" , I went back and looked at it many times, and found that the last part expressed some problems with the relationship between CAP, ACID, BASE, and "BACP (self-made)", and it was not very rigorous, but I was helpless. The content that has been sent cannot be modified, and many friends are chatting privately. I will confirm the details. Here I will update it again, directly above the picture!


Some friends also told me that CAP is a bit unpopular, and suggested that I change something that is acceptable to the public. In fact, I have to admit that everyone may prefer a specific technology stack. But from a personal perspective, the basic knowledge of these big data is very important. After I finish writing the big data basic series, I will gradually start to move to the specific technology stack, don't worry! The road to big data is very long, we will go slowly. As I will mention in every article, I believe that persistence will always pay off.

Let’s not say much. Entering the topic, many small partners know more or less two or three kinds of consistency, such as strong consistency and final consistency. But do you think there is only this consistency? This article brings a detailed explanation of consistency and explains consistency. Welcome everyone to correct me~

Strong consistency

In the consistency family, although there are many types of subdivisions, there are actually only two major categories. One of them is the strong consistency that we have introduced in the previous " CAP " , which specifically includes strict consistency (also called atomic Consistency or linear consistency) and sequential consistency.

1. Strict (atomic/linear) consistency

Strict consistency means that when the data is updated, all client reads and writes are based on the data update. As shown in the figure below, we assume that each piece of data has three copies, which fall on three nodes respectively. When Client1 tries to set the value of X to 1, strict consistency requires that after Client1 completes the update operation, all Clients must read and write based on the latest value. The value read by Client10 here is x=1, At the same time, the update operation of Client100 also performs the x+=1 operation on the basis of x=1. At the next time, any copy of Client1000 reads, the value of X will be 2.

At this point you will find that everything seems to be perfect. But think about it, what's the subtext behind strict consistency?

Synchronous data replication

Strict consistency means that all data is replicated synchronously during write operations , that is , if multiple copies are successfully written, the write is considered successful. Is HDFS the best example? And it is atomic. For a write operation, the result is either a write success or a write failure, and there is no intermediate state. This is why it is called atomic consistency.

Strict consistency does not consider the client

Many people on the Internet try to analyze the consistency from the client side and the server side separately; but when we analyzed the CAP in the previous article, we also mentioned that we should not play with the client. So who is right and who is wrong?

Here we look again, what are we more concerned about when considering the consistency of distributed systems? Is it the time and sequence when multiple clients send read and write requests to the backend? No, what we really care about is the order in which each request is completed by the server. Still referring to the above example, Client1000 reads the result of x=2, which is actually based on the completion of the write operation of Client100. If the write operation of Client100 has not been completed, the strong consistency requires that the X read by Client1000 is 1. Instead of 2. Therefore, we need to pay more attention to the specific time when the operation is completed, rather than the time when the operation is initiated. For consistency, it is not meaningful to consider the Client. Of course, the idempotence of multiple Client operations at the same time must be guaranteed.

From another perspective, Client is the first party with consistent requirements, right? The distributed system, as Party B, can only meet the needs of Party A, or reject the needs of Party A, instead of requiring Party A to make any changes!

Based on strict global clock

We mentioned above that the order of the completion time of the operation behavior is very important. Take a closer look at the content of the above example, I believe you will find that all behaviors are in the dimension of time. The order of behavior is: Client1 updates x=1 -> Client10 reads x=1/Client100 updates on the basis of x=1 x+=1 -> Client1000 reads x=2. Therefore, each operation is performed on the basis of the completion of the previous operation. In distributed services, a benchmark time is required to measure the sequence of each operation. At this time, you may ask, is there NTP on the machine for time calibration? The problem now is that there is no guarantee that the time of each machine is absolutely the same.

For example, the node where the data D2 is located is a few seconds ahead of the time of the D1 node. When the update request of Client1 is completed (it takes 500ms), the request of Client10 is executed and completed. If the machine time is used as the benchmark, it will be found that the time of Client10 The read operation actually precedes the update operation of Client1, which obviously violates strong consistency.

How do we generally guarantee the global clock? Here is a brief talk about three common solutions.

Hybrid Logic Clock

In the hybrid logical clock, the local physical time and logical time of the node are compared with the physical time in messages sent by other nodes. Both kudu and Cockroachdb also use this method. Although HLC adds physical time, it still relies heavily on the machine's NTP, which is not an accurate clock in the strict sense. In HLC, a boundary needs to be defined for the clock , such as in kudu The maximum check error (maximum clock error) is defined. If the local NTP is not started, kudu will directly fail when it is started; if the error exceeds the maximum check error, an error will still be reported, which means that when it exceeds the HLC setting HLC can no longer work normally.

When looking at the implementation logic of HLC, I found that there are more steps, and the meaning of logic time is that it is used as an intermediate value or a backup value during time comparison. Since this is not the focus of this article, I won't repeat it here. Interested friends can read the paper: https://cse.buffalo.edu/tech-reports/2014-04.pdf.

True Time

The article also mentioned that Google rely on strong capital strength reduces the probability of occurrence of network partition, while the face of the global clock problem, Google uses the Spanner + Atomic Clock is the GPS (meaning atomic clocks can Baidu) this The pure hardware method is used to calibrate the cluster machines, and the accuracy is at the ms level. Here we use ε to represent the time accuracy error, and the time accuracy error range is between the range of [t-ε,t+ε] . At this point, returning to the above operation, in this way, the machine time of Client10 is advanced or lags behind by at most 2 ε compared with the machine time of Client1. Therefore, Spanner introduced the commit wait time scheme, which is simply after the operation is completed. Wait for a while, after this range of accuracy error, the global order will naturally be in order. Google will control the accuracy error to a few ms level. Of course, for a global, cross-regional distributed system like Spanner, wait a lot. A few ms is not a big problem.

Unfortunately, this set of hardware solutions from Google has not been open sourced, and its applicability is limited. Let's quench our thirst.

Time Service Center TimeStamp Oracle

There is also a "time service center" in life. It seems to be in Shaanxi. You can check the specific location. What is its role? It provides an accurate time for various infrastructure and systems in China to avoid errors. (Personal YY: In the event of a war, we can't just because other countries interfere with the reference time, so all our infrastructure will be paralyzed, so the time service center is of great significance.) 

In distributed services, there are actually similar solutions. Here is an example of Tidb. In order to calibrate the time, Tidb adopts the TSO scheme. For Tidb, all behavior events are uniformly allocated by the PD node. Although this scheme will generate very high-frequency mutual calls, according to the official introduction of Tidb, In the same IDC network environment, the network transmission overhead is very low, only 0.xms. Of course, if you are facing a cross-IDC network, you can try to mix PD nodes and Tidb nodes (Tikv still needs to be deployed independently for the separation of storage and computing). This does not require the overhead of the network. Of course, if the client side cross-IDC, there is still no good way.

2. Sequence consistency

Above we talked about strict consistency (linear/atomic). It is difficult to achieve absolute global order under the global clock. The implementation of HLC is more complicated. Google's atomic clock + GPS is not open sourced. TSO has increased the system the complexity. It is so difficult to implement a global clock!

Can we take a step back here and abandon the ordered counter of "time" and try to construct a better-maintained counter. It does not guarantee that the global behavior is absolutely orderly, but that the distributed service is relatively orderly globally?

As shown in the figure below, D1 has updated x=1 and x=2, and D3 has updated a=1 and a=2. When the Client reads the D2 node, in accordance with the requirements of sequential consistency, the relative order of operations of all nodes is the same. It must be x=1 before x=2, and a=1 before a=2. The following figure shows an example It is one of the cases of sequential consistency.

Logic Clock

Logic Clock, if this name is unfamiliar to you, maybe his other name Lamport Timestamp will make you think about it again and again. If you still have no impression, do you know Paxos? (If you do not know paxos as a big data person, then you need to learn the basics????.) The authors of Lamport Timestamp and Paxos are actually the same person-Lamport, Paxos is the doctrine in the consensus algorithm of distributed systems It can be seen that the logic clock Logic Clock will not be worse.

As shown in the figure above, in the Logic Clock algorithm, a time stamp is recorded inside each machine. Take A and B for example, the initial value is 0. Whenever machines A and B execute an event, their respective timestamps will be +1. When A initiates communication to B, A will attach its own timestamp, such as <message, timestamp>, then B will Compare the timestamp in the message with the local timestamp, and select the maximum value max (local timestamp, message timestamp) to change the local timestamp. In this way, a new counter is constructed to achieve global relative sequentiality, that is, local: timestamp +=1 && remote: select max timestamp, even if the time of each node is different, the operation execution is orderly; but the problem is also Obviously, the new counter cannot match the actual time.

How consistent is ZooKeeper?

Here we will not talk about the consensus algorithm Paxos for the time being (please continue to pay attention to follow-up articles). Let's talk directly about the consistency of ZooKeeper. Many information on the Internet say that zk is final consistency. I'm sorry, zk is strong consistency, and is the sequential consistency in strong consistency. Why not say that zk is the cause of eventual consistency?

It's like you got 99 points in the exam, you have to say you got 60 points in the exam. This not only represents the score, the final consistency lowered the zk level, directly from the strong gear to the weak one.

 In addition, you can refer to the first picture in this article to see that zk is a CP system. Derived from the reverse of CAP, if zk is final consistency, it means that it is an AP system, but zk is actually unavailable during the election, that is, A cannot reach it, and a contradiction occurs at this time.

The Zxid in ZooKeeper is actually a self-made counter in the logic clock Logic Clock. It can be found that the sequence of operations recognized by all nodes can be the same based on Zxid. As mentioned above, Zxid cannot correspond to actual time. A lot of information on the Internet says that ZooKeeper writes are linearly consistent. I don’t agree with this. The two-stage submission (which will be discussed in a later article) involved does not have a rollback but will automatically synchronize, which is a sense. The above is linear, but during the commmit phase, a network partition occurs. At this time, the data will not be synchronized to the abnormal node. If another Client accesses this node at this time, the old data is read at this time Up. The write failure means that all nodes have failed to write, and the write success means that not all nodes have written successfully. Therefore, I think it can only be regarded as sequential consistency, but not linear consistency.

Give a chestnut

If there is no concept of sequential consistency, then you can understand that the distributed system is the WeChat circle of friends. When I posted a circle of friends, Jay Chou started to comment after seeing my circle of friends, and then Wang Leehom commented again ????. Order consistency means that when our mutual friend Junjie Lin looks at Moments, he must first see Jay Chou’s comments and then Wang Leehom’s comments. When friend Wu Yifan sees it, he may only see Jay Chou’s but Wang Leehom’s. Comments may be late, but they will never be absent. And there will never be a situation where Leehom Wang’s comment comes first and Jay Chou’s comment comes later.

Weak consistency

Another major category in the consistency family is weak consistency. Compared with strong consistency, weak consistency allows data inconsistency on the basis of ensuring availability.

Why is consistency divided into strong and weak?

According to the theory of CAP in the previous article, consistency in CAP. In actual demand, you will find that more and more distributed systems value availability A rather than consistency C. High availability requires our system to respond to Client's read and write requests within a specified time and return results without reporting errors.

Faced with different scenarios, various AP systems can only make great efforts in weak consistency. And therefore there has been a lot of weak consistency model, under the surface we have to analyze one by one down.

Final consistency

Previous we mentioned in BASE is the ultimate in consistency, and it is not guaranteed at any time, the same data on any node are exactly the same, but Over time, the same one on different nodes The data is always changing in a consistent direction. The time period when the data is inconsistent is called the inconsistency window. Simply put, after a period of data writing, the data of each node will eventually reach a consistent state. As shown below.

Causal consistency

The causal consistency emphasizes the causal relationship between the data. The initial state X has a value of 1. When D1 updates X to 2, then D1 will communicate with the D2 node and set <D1,D2,X,2> This message is passed to D2, and all subsequent reads and writes of the D2 node are performed on the basis of the new value. At this time, the D3 node will still read the old value 1 of X in the inconsistent window. Therefore, D1 updates the data and communicates with D2 as the cause, and D2 accepts the communication and modifies the local state as the result, which is causal consistency.

"Read what you write" consistency

What is the consistency of reading what you write? As the name implies, when you read the data you write, in causal consistency, it defines the update notification mechanism between cluster nodes. For "read what you write" consistency, this idea is also used, but the communication here is This node also affects all subsequent read and write requests of the machine. Therefore, reading the consistency of what you write is actually a special scenario of causal consistency. Because the principle is similar, the following figure is directly shown.

Session consistency

Session consistency is based on the session , and is usually applied to the scenario of a database-like system. Each session is equivalent to an independent access link. In this session, many specific read and write operations can be performed, and the sessions are relatively independent.

As shown in the figure below, the session consistency requirement, as long as session 1 still exists, the consistency of "read what you write" is guaranteed within the session. If session 1 of D1 is terminated, when session 2 is re-established, even if it is a session of the same node in the inconsistency window, the data is not guaranteed to be consistent. Therefore, conversational consistency is a special case of reading the consistency you wrote (does it feel a bit messy, one special case after another special case, it doesn’t matter, at the end we will talk about the various consistency relationships, come on!)

Monotonic consistency

Monotonic consistency is divided into two levels: read and write, namely monotonic read consistency and monotonic write consistency. In fact, it is easy to understand that monotonic reading guarantees that the entire system must be read in order. Assuming that the value of X is increasing, reading X=2 must be after reading X=1. The monotonic writing ensures that the writing must be orderly. For the entire system, writing X=2 must be written after X=1. It's easier to understand here, so I won't make the picture straight (the picture is spitting up!). Personally think that monotonic write consistency should be the basis of most distributed services. Otherwise, if you cannot guarantee the writing order, you will feel painful thinking about it.


Various consistent relationships and common misunderstandings

Various consistent relationships

As shown in the figure below, a picture clarifies the relationship between consistency. If you forget the relevant concepts at this time, you can revisit the analysis above~

Misunderstanding 1: Strong consistency = linear consistency?

There are so many data on the Internet that strong consistency is linear consistency, or the sequential consistency is placed at the same level as strong consistency for comparison. These are actually wrong, as we have seen above, strong consistency Including linear consistency and sequential consistency. Therefore, strong consistency is not necessarily linear consistency, but linear consistency must be strong consistency.

Misunderstanding 2: Consistency is not black and white

There is no clear distinction between strong and weak consistency. A distributed system may satisfy one or more of them at the same time. Let's go back to ZooKeeper, which also conforms to sequential consistency and final consistency. The more consistent types are satisfied, the more consistent scenes, and of course the higher the complexity. From this we can see that there is no distributed system that can meet all scenarios. If it does, then its data consistency logic must be very complicated.


更多精彩推荐
☞Linus Torvalds 回应,Debian 项目曾讨论永久禁止他出席会议!
☞对话阿里云李飞飞:云原生数据库的时代来了
☞IT 往事录:苹果 Mac 之父,却在 Mac 问世前黯然退场
☞B 站神曲damedane:精髓在于换脸,五分钟就能学会
☞可怕!公司部署了一个东西,悄悄盯着你……
☞极简椭圆曲线密码学入门
点分享点点赞点在看

Guess you like

Origin blog.csdn.net/csdnnews/article/details/108633603
Recommended