Graphical Consistency Model

Introduction: This article uses a large number of legends, and there are no difficult formulas. It intends to explain clearly what problems the consistency model solves, and three consistency models: sequential consistency, linear consistency, and causal consistency.

overview

What problem does it solve?

To ensure the availability of the system in a distributed system, it is necessary to provide a certain degree of redundancy for the data: a piece of data must be stored on multiple servers before it can be considered successful. As for the number of redundancy to be saved here, there is Majoritya Quorumsaying , you can refer to the previous article: Weekly (No. 17): Read-Write Quorum System and its practice in Raft .

The same data is stored on multiple machines to provide redundancy, also known as 副本(replica)策略, this approach brings the following benefits:

  • Fault tolerance: Even if several machines in the distributed system fail to work, the system can still provide external services as usual.
  • Improve throughput: Since the same data is stored on multiple machines, requests for the data (at least read requests) can be distributed to multiple copies, so that the entire system can be linearly expanded and more machines can be added to cope with the volume of requests Increase.

At the same time, the replica strategy also has its own problems to be solved, the most important of which is the consistency problem: Is the data written by one machine in the system the same in the eyes of other machines in the system?

Obviously, even if everything is working properly, one machine in the system successfully writes data, because it takes time to broadcast the modification to other machines in the system, and then other machines in the system see the results of this modification It still takes time. In other words, the one in the middle 时间差may have transient data inconsistencies.

It can be seen that due to 时间差the objective existence of this, there is no 绝对data consistency in a sense. In other words, 数据一致性there is a strict scope for its implementation, and the stricter the data consistency, the greater the cost and price to be paid.

In order to solve the consistency problem, the consistency model needs to be defined first. On the wiki page, 一致性模型(Consistency model)the definition is as follows:

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable.

Let's take a common problem in daily life to explain 一致性模型:

wechat

In the image above:

  • Think of 朋友圈it as a large 分布式系统:

    • This distributed system provides the functions of writing (sending to Moments) and reading (reading Moments).
    • There must be more than one machine to store the data of these circles of friends, so these machines together constitute this large distributed system.
    • Different users may not all write to the same machine when posting to Moments. The opposite is also true, when reading Moments, you may not necessarily go to the same machine to read data.
  • 朋友圈In this distributed system, there are two kinds of clients: 发朋友圈the client is responsible for writing data, 读朋友圈and the client is responsible for reading data. Of course, in many cases, the same client can both read and write.

The next question is:

  • Can these people who look at Moments see globally consistent data? That is, the Moments that everyone sees are arranged in the same order?

Obviously, there are many times, even if you are looking at the comments and replies under the same circle of friends, different people may not see them in the same order, so the above answer is no. Then the next question is introduced:

  • If different people see Moments (including comments) in different orders, what rules should these orders follow to be reasonable?

What kind of answer 顺序规则is reasonable, this is 一致性模型the question to be answered.

Consistency Model Legend

This article intends to use various legends to explain the consistency model, so before continuing to explain, it is necessary to explain the various elements in the legend first. The following figure is an example:

sample

In the image above, there are the following elements:

  • On the far left are the process numbers P1, P2​ in the distributed system.

  • The horizontal axis is the time axis, and the time increases from left to right, but there is no strict time scale here.

  • Events that occur in the process, the naming rules of the event are 进程编号_进程中的事件编号, for example, P1_1, in addition to:

    • An event has its start and end time, and a rectangle is used to represent the execution of an event, so the width of the rectangle can be considered as the width on the time axis, that is, the execution duration of the event.

    • Events may overlap in execution time, such as P1_1 and P2_1 in the figure. Such overlapping events are called concurrent events. In terms of sequence, who can be considered between concurrent events It doesn’t matter who comes first, and I will talk about this part later.

    • Use different colors to distinguish events occurring on different processes.

    • Each event is associated with an operation. To simplify the problem, there are currently only read and write operations:

      • w(x) = A: Writes A to the variable x.
      • r(x) = A: reads A from variable x.

Event sequence under single process

Let's continue to return 朋友圈to the topic. There are many people commenting under a circle of friends, which can be considered as one 二维piece of data:

  • Process (that is, who made the comment) is one dimension.
  • Time is another dimension, the order in which these comments appear.

However, from the perspective of readers who read these comments, it is necessary to 二维remove the dimension of different processes from this data, 压平and put it on a single dimension that only has the timeline of this process. In the example above, it looks like this:

wechat-2

In the figure above, from the perspective of the reading process P3, the events of the two writing processes need to 压平be arranged on the timeline of this process. It can be seen that these events 压平may be arranged in multiple ways in the future.

Arrange the events of multiple writing processes and put them on the timeline of a single process. This is a permutation and combination problem. If all the writing process events add up to one n, then all the permutations and combinations of these events are n!. For example, events a, b, c, and different permutations include these:

{(a,b,c),(a,c,b),(b,a,c),(b,c,a),(c,a,b),(c,b,a)}

一致性模型Just to answer:在所有的这些可能存在的事件排列组合中,按照要求的一致性严格程度,哪些是可以接受的,哪些不可能出现?

We will see later in the description: the looser the consistency model, the more possibilities of event arrangements can be accommodated; on the contrary, the stricter it is, the less it will be.

consistency model

This article will discuss the following three consistency models: linear consistency, sequential consistency, and causal consistency. The above are ranked in strict order, that is to say, linear consistency is the strictest and causal consistency is the weakest. It should be noted that there are other consistency models, but they are beyond the scope of this article.

sequential consistency

The definition of sequential consistency first appeared in the paper "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programm", which requires the sequential consistency model to meet two requirements:

the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.

(The result of any execution is the same as if the operations of all processors were executed in some order, and the operations of each individual processor appeared in this sequence in the order specified by its program.)

It has two conditions:

Requirement Rl: Each processor issues memory requests in the order specified by its program.

Requirement R2: Memory requests from all processors issued to an individual memory module are serviced from a single FIFO queue. Issuing a memory request consists of entering the request on this queue.

Let's look at condition 1 first:

  • Condition 1: The execution order of each process must be consistent with the program execution order of the process.

As mentioned earlier, when the reading process reads multiple events of multiple processes, it is equivalent to "flattening" these events of different time and process dimensions into the same time dimension of the process. Condition 1 requires that this be followed 压平by order,每个进程发出的事件的执行顺序,和程序顺序(program order)保持一致。

Give a counter-example that violates this condition, as shown in the following figure:

program-order

In the picture above:

  • From the perspective of process P1: the execution sequence of the program is to execute P1_1 first, and then execute P1_2.
  • But after events are rearranged from the perspective of P3, P1_2 appears before event P1_1, which is not allowed, because it violates the program sequence of the original process P1.

However, condition 1 alone is not enough to satisfy sequential consistency, so there is condition 2:

  • Condition 2: Reading and writing to variables should behave like a FIFO first-in-first-out, ie 每次读到的都是最近写入的数据.

FIFO

Let's take an example to fully illustrate sequential consistency:

seq-model

In the figure above, there are three processes that read and write the variable x:

  • Process P1: event P1_1 modifies x to A, and event P1_2 modifies x to B.
  • Process P2: Event P2_1 modifies x to C.
  • Process P3: event P3_1 reads x as A, event P3_2 reads x as C, and event P3_1 reads x as B.

Note: In the above figure, events P1_2 and P2_1 overlap, which means that these two events are "concurrent events", that is, which event occurs first and completes is acceptable.

The lower half of the figure shows three possible permutations of events:

  • The first arrangement:

    • Interpret the operation corresponding to the event, then the execution sequence is: {w(x)=A,r(x)=A,w(x)=B,r(x)=B,w(x)=C,r (x)=C}.
    • It can be seen that the above sequence neither violates condition 1 (because the sequence of programs in the same process is not disrupted), nor violates condition 2 (all the data that was read at the beginning is read).
  • The second arrangement:

    • Interpret the operation corresponding to the event, then the execution sequence is: {w(x)=A,r(x)=A,w(x)=B,r(x)=C,w(x)=C,r (x)=B}.

    • Since p1_2 and p2_1 are concurrent events, the order of the two can be arranged arbitrarily, here choose to execute p1_2 first, you can see:

      • w(x)=B,r(x)=C, which violates condition 2.
  • The third arrangement:

    • Interpret the operation corresponding to the event, then the execution sequence is: {w(x)=A, r(x)=A, w(x)=C, r(x)=B, w(x)=B, r (x)=C}.

    • This time, I chose to execute p2_1 first, and you can see:

      • w(x)=C,r(x)=BAnd w(x)=B,r(x)=C: Condition 2 is violated.
      • p3_3 executes before p3_2, violating condition 1.

The above is the explanation of sequential consistency, which requires two conditions:

  • The program sequence of a single process cannot be disturbed, and the sequence of events in the same process must be preserved.
  • The latest value is read every time, and once an arrangement is formed, all processes in the system are required to have the same arrangement.

What needs to be specially explained here is that as long as these two conditions are met, there are no 不同进程的事件other rigid regulations on the sequence, so even if some events seem to violate the sequence of events after they are arranged, it is also possible. In fact, this is already reflected in the above figure:

  • Event p3_1 obviously occurs later than events p1_2 and p2_1, but as long as it is the first read event immediately following p1_1 after the reordering, there is no sequential consistency violation. Under this major premise, event p3_1 can even appear before p1_2 and p2_1, which seems very 违反直觉good.

Another example is the following picture:

seq-model-2

In the above figure, the three events are deliberately drawn separately, which means that the three events do not overlap, that is, there is a clear sequence, but from the perspective of the sequential consistency model:

  • Both {p1_1,p2_1,p3_1 and p1_1,p3_1,p2_1} are true because neither of them violates conditions 1 and 2.
  • Only the bottom {p3_1,P2_1,P1_1} is wrong because condition 2 is violated.

It can be seen that after the sequential consistency satisfies the conditions 1 and 2, 不同进程的事件there is no rigid requirement for the order between them. Even if an event should happen earlier in sensory intuition, as long as these two conditions are not violated, it can be considered It satisfies the sequential consistency model.

So there is a stricter linear consistency, which is based on the condition of sequential consistency, and has stricter requirements on the order of events.

linear consistency

Linear consistency requires that the condition of sequential consistency be met first, and at the same time there is one more condition, which may be called condition 3:

  • Condition 3: If the events of different processes do not overlap in time, that is, they are not concurrent events, then the sequence is required to be consistent after rearrangement.

If this stronger condition is added, in the figure above, only {P1_1,P2_1,P3_1} is an arrangement that satisfies linear consistency.

Another example to illustrate linearizability:

image.png

This is the first diagram to explain the sequential consistency model, but under the condition of linear consistency, no permutation that can satisfy the condition can be found.

This is because:

  • Events P2_1 and P1_2 both follow event P1_1 and this order needs to be maintained.
  • And event P3_1 is after events P2_1 and P1_2, this order also needs to be maintained.
  • If the previous two sequences are kept, then when P3_1 is executed, A must not be read out, but B or C (that is, the execution result of P2_1 or P1_2).

Summary of Sequential Consistency and Linear Consistency

It can be seen that if linear consistency is satisfied, sequential consistency must be satisfied, because the latter condition is a proper subset of the former.

In addition to meeting these conditions, there is another requirement for these two consistency: the order of all processes in the system is consistent, that is, if process A in the system uses a certain sort according to the requirements, even if there are other sorting possibilities , other processes in the system must also use this sort, and only one sort that meets the requirements can be used in the system.

This requirement makes a system that satisfies sequential and linear consistency "behave as if there is only one copy" to the outside world.

But causal consistency is different: as long as the conditions of causal consistency are satisfied, it does not matter if the sequence of events of different processes is not consistent.

causal consistency

Compared with sequential and linear consistency, causal consistency is simpler. In fact, it only needs to satisfy the happen-before relationship mentioned in the Lamport clock :

  • The symbol →→ is introduced as a token representing between events happen-before.

<!---->

  • In the same process, if event a occurs before event b, then a→b. (This is because according to rule 1, the process will add one to the local lamport clock after each event is issued, so the sequence of events can be defined in the same process)
  • In different processes, if event a means that a process sends out an event, and event b means that the receiving process receives this event, then a→b must also be satisfied. (This is because according to rule 2, the receiving process will take the maximum value of the local clock and the event clock and +1 after receiving the event, so although the sending event and the receiving event are in different processes, they can also compare their lamport clocks to know their sequentially)
  • Finally, happend-beforethe relationship is transitive, that is, if a→b and b→c, then there must also be a→c.

This behavior is 评论朋友圈a perfect explanation for causal consistency:

  • Comment on another user's comment: It is equivalent to a process sending a message to another process, and it must satisfy happen-beforethe relationship, that is, there must be a comment before it can comment on this comment.
  • Comments from the same user: Events that are equivalent to the same process must also satisfy happen-beforethe relationship.

Take the following picture as an example: there are 4 readers in Moments, and the order of comments they see is different:

  • For the top two readers, the order of reading satisfies causal consistency, so even if the order is different, it is correct.

  • For the bottom two readers, neither order satisfies causal consistency:

    • A replies to B This event occurs before B replies to A, which does not conform to happen-beforethe relationship between multiple processes.
    • A's reply to C should be before A's reply to B in process A, which does not conform to the order of events in the same process.

Summarize

  • In a distributed system, multiple processes are combined to coordinate their work and generate multiple events, and there can be multiple arrangements of events.

  • A consistency model essentially answers the question: According to the definition of the consistency model, what sequence of events meets the requirements?

  • Both sequential consistency and linear consistency intend to allow all processes in the system 看起来to have a unified global event order, but the requirements of the two are different, sequential consistency:

    • Condition 1: The execution order of each process must be consistent with the program execution order of the process.
    • Condition 2: The reading and writing of variables should behave like a FIFO first-in-first-out, that is, the latest written data is read every time.

    As long as these two conditions are met, sequential consistency does not have a hard requirement on the order of events, and linear consistency has an additional condition 3 on this basis:

    • Condition 3: If the events of different processes do not overlap in time, that is, they are not concurrent events, then the sequence is required to be consistent after rearrangement.
  • Causal consistency is a weaker consistency as long as happen-beforethe relation is satisfied. Since happen-beforethe relationship is actually defined by the Lamport clock, which is a kind of logical clock, the order that different readers see may be a bit different 反直觉, but as long as happen-beforethe relationship is satisfied, it is correct.

References

[1] Weekly (No. 17): Read-Write Quorum System and its practice in Raft: https://www.codedump.info/post/20220528-weekly-17/
[2]happen-before:  https:/ /www.codedump.info/post/20220703-weekly-21/#happen-before relationship
[3] How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programm:  https://www.microsoft.com/en-us/research /uploads/prod/2016/12/How-to-Make-a-Multiprocessor-Computer-That-Correctly-Executes-Multiprocess-Programs.pdf
[4]Linearizability: A Correctness Condition for Concurrent Objects: https://cs. brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
[5] History of the development of distributed system consistency (1): https://danielw.cn/history-of-distributed-systems-1
[6] Analysis Distributed: Analysis of Strong and Weak Consistency - Tie Lei's personal blog:http://zhangtielei.com/posts/blog-distributed-strong-weak-consistency.htmlAbout

Data bend

Databend is an open source, flexible, low-cost, new data warehouse that can also perform real-time analysis based on object storage. Looking forward to your attention, let's explore cloud-native data warehouse solutions together to create a new generation of open source Data Cloud.

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/5489811/blog/6965826