Achilles' heel of distributed systems: data consistency

Appendix:   Original Address

 


Why do you need a distributed system?

        

Anything that can be sustainable use and development, must have its value, distributed systems as well. Generating a distributed system I think the main purpose is " fast " and " massive ."

This "fast" can be divided into two areas:

  • The first one is the high processing speed of the system.

  • The second is the development of fast speed (short duration).

2 points are essentially the same, the operation of a thing or split into two or more portions of the part to be performed simultaneously, so that the overall time-consuming shortened.

For example: a person had one thing to do, then to two minutes. So I hired two people to help me do their part, then the next minute the ideal situation would be completed.

Of course, these two aspects in the second sense can be overcome, but the first term can not be overcome. Because there is no one program or a computer, its performance is infinite, if there is, it will not be a distributed system as it is now so commonplace (often with money problems can be resolved is not a problem)

 "Mass" is due to the infinite hard drive does not exist, so we need to store data to a different hard disk, in order to meet the demand. These drives may be in different hosts, different rooms, different regions in the future it might be on a different planet.

02

Side effects of distributed systems

       

Each things have the so-called unified combination, it has two sides. Distributed systems and then bring the benefits mentioned above, but also brought the industry generally believe that the biggest problem -  data consistency problem .      

The system is to give people with a form called business concept usage scenarios.

Business is the core of a system, the development of business in the final analysis is based on data. I can slow down can be made very complicated, and they may be able to, but it alone can not tolerate is the data problems, data errors, inconsistent data, and so on.

Distributed means partition and collaboration, one thing a person only responsible for part of it. Examples of such a life is everywhere.

Take a Party is held: some people to prepare food, to prepare some people drink, some people to prepare site layout.

These things we can all at the same time, but the ball dropped any link, or unable, in line with Party theme, then, it is a failure. (I do not know why, mind emerges is a conference, and shouted cheers, did a goblet of Erguotou ... ).

As another program tram business case scenarios:

Here are four operations to target, in fact, the order is not important, it is important to either succeed or fail, a procedure in which any inconsistencies that will be a problem.

Essentially and communication problems between people is similar to the problem, before also wrote an article specifically talked communication problems, are interested can extend under the reading: " simply chat communication efficiency ," above examples of the Party is the truth.

And communication only difference is that the program, it does not necessarily have to get a response, no response is consistent.

When a thing is divided into 100 portions do, it is very terrible, from the point of view of probability, probability to reach consensus is 2/5050 .      

To give an example of a program is not rigorous, because the actual distributed system, because in addition to "write" operation as well as "read" operation, so consistency is more complicated than this, there will be described in more detail below.

 

03

The reason produces inconsistent data

So what causes inconsistent data to produce it?

First, the programming problems, or that the code wrong. This point is well understood, it is easy to think of solutions that do more tests to verify compliance expectations slightly.

Common unit testing, interface testing, automated testing, integration testing, and so are more cost-effective in order to reduce the BUG infinitely close to zero, but also created a larger "Test Engineer" This job role.        

But suppose really did not BUG, ​​but still will produce inconsistent data, because the software is run on the hardware, there is also hardware factors.

And for most of us here, the hardware compared to software, our control force weaker. Among these, the most serious case of network problems , the other network is compared to a larger, more complex organizations, known as the will of the local area network, wide area network such scope bigger and more serious.

想象一下,每一台主机仅仅是一张大网中的一个渺小的连接点,它所承载的链接越多越容易出现问题。       

可能有的小伙伴会有疑问,其它像硬盘、电源断电什么的,也有出现问题的可能性,为什么网络问题最为严重呢?

其实硬盘、电源好比是你身体的一部分,如手和脚。而网络是人与人之间沟通的渠道,比如手机通话.虽然你没有主动挂断电话,但是整个通话过程是有很多可能性导致中断的,对方的主观意愿也好、信号不好也罢,甚至被第三者给拦截了。

相信大家也能认可,打电话出现异常的概率相比自己的手脚不听使唤是高很多的吧。

现实中网络的特点,常遇到的问题如:延迟、丢包、乱序等问题。

为了解决这些问题,从互联网第一次出现的1969年(当年美军在ARPA制定的协定下用网络连接了4所大学)到现在,几十年间出了很多的理论和解决方案,这些会在后续的文章中给大家一一做梳理。本文先和大家具体剖析下什么是一致性。

 

 

04

详解一致性

        

首先什么叫达成一致了?说起来很简单:

 

在任意时间、任意位置看到的同一个事物是完全一致的。

 

比如一场足球赛。我们不管在现场还是在电视机前,看到足球从球员A传给球员B,这个信息都是一样的。

但是严格意义上来说,这个并称不上真正的一致,因为电视机接收到这个信息需要经过卫星信号、网络等的传输,我们看到的时候相比现场的人肯定要晚。

哪怕在现场的人,根据他所处的位置理论上看到的信息也存在延迟差,只是因为光速非常快,使得在相差几百米之内,这个延迟小到完全感受不到而已。      

能得出的结论是:在考虑时间维度的情况下,不存在真正意义上的一致。      

况且我们在分布式系统中,也没有必要去达到真正的意义上的一致。

因为越趋近于一致,系统相当于又归一成一个单体了,在某一个时刻,只能做一件事,完全丧失了分布式系统的两个目的之一“快”的优势。也因此衍生出多种一致性的变种,分别适用于不同的场景。为了便于理解,我们从严格程度的低到高来说。

大多数情况下,为了尽可能的“快”,系统中使用的大部分方案都是所谓的最终一致性,也就容忍一定条件下的不一致,优先保证局部一致,然后再通过一系列复杂的状态同步达到全局的一致。

最终一致性很多可实现的分支,列出几种常见的,抛砖引玉一下:

  • 因果一致性:仅要求有因果关系的操作顺序得到保证。比如朋友圈的回复功能。问“饭吃了吗?”肯定得在回答“吃了”之前。

  • 读你所写一致性:文字看着别扭,但很好解释。比如你在朋友圈下面回复一句话,其它好友可以不用马上看到你的回复,但是你自己必须得马上看到,要不然回复到哪去了?

  • 会话一致性:与人的一次聊天可以理解为一次会话。聊天虽然也有一定的因果关系,但是大部分场景下更多的是逻辑上的先后关系。

    比如你阐述一个事情,分为3条信息:首先...,然后...,最后...。如果这里的一致性得不到保证那么可能会变成:最后...,首先...,然后...。        

比局部一致更严格一些的就是全局的顺序一致性[附录1,1979年提出],保证所有进程看到的全局执行顺序一致,并且每个进程自身的执行顺序和实际发生顺序一致。

像上面提到的足球赛,比如实际发生的事情是

  1. 梅西把球传给了C罗

  2. C罗又把球回传给了梅西,那么每个人看到顺序都应该是这样。

    哪怕现场观众已经看到②了,电视机前的我们还没看到①,但是没关系,这个事情发生的顺序,对全世界来说都是一样的。

再严格一些,就是在全局的顺序一致性基础上再增加一个相对时间的一致性要求,业界称之为线性一致性[附录2,1990年提出]。

还是用上面梅西和C罗相互传球的例子来做个比喻,相当于梅西传出球给C罗之后,整个球场“暂停”了,要等所有在观看这场球赛的人都接收到这个传球信息之后,C罗才能做下一个回传。

这里需要一个上帝(全局时钟)来“暂停”。这是我们实际可以做到的极限了,满足这类要求的系统中,名气最大的就属Google的Spanner了。       

对不同级别的一致性汇总概述如下:

Guess you like

Origin www.cnblogs.com/williamjie/p/11095777.html