Distributed theory (a) CAP theory

Distributed theory (a) CAP theory

One. CAP theory Introduction

      CAP principle, also known as CAP theory, the main idea is in any distributed system can not be met simultaneously CAP .

C ( Consistency ): indicates that consistency, all the nodes at the same time see the same data.

A ( Avaliablity ): indicates availability of, whether or not successful, to ensure that a request can receive a response.

P ( Partion Tolerance ): partitioning fault tolerance, the system any partition, when a network failure, can still operate.

 

 

As described above, as Gilbert believes that consistency is actually a relational database talked about the ACID , a user request either a success or a failure, and can not have in an intermediate state; once a transaction is complete, all transactions must be based on the future after the completion of this state; incomplete transactions do not affect each other; once a transaction is completed, that is, the precepts. Availability in fact, for a system, all requests should be "success" and receive a "response." Partition fault tolerance actually refers to the fault-tolerant distributed system, one node fails, it does not affect the normal use of the entire cluster.

 

two. CAP Introduction to the Theory

      As shown, in a network, Nl and N2 that is two nodes in a distributed system, they are shared data block V , which is a value of V0 .

 

 

when satisfied consistency, A of V0 should be and B is V0 consistent, i.e. V0 = V0

when meet availability, both requesting access to A or B should get a response.

When satisfied the availability of the partition, A and B just a case where the network is down or the occurrence of nowhere, should not affect the overall system availability.

If the above-described A program update V0 value Vl , and then update the B copies of Vl , when a request for access to B , the result is Vl . As shown below:

 

 

但是分布式系统中,有些时候这些并不能按照你想的这样进行,在分布式系统中通常情况下网络是分区的,如果出现了网络延迟,导致N1上更新的消息无法到达N2上,即N2上的数据副本依然是V0,当一个请求访问B时,获取到的结果是V0,而访问A时,获取到的结果是V1,这就导致了在用户看来是同一个请求,得到的结果是不一样的。如下图

 

 

在这个时候,方案的设计者就应该在这里做出两种选择:

1)牺牲数据一致性,保证可用性。响应旧的数据V0给用户。

2)牺牲可用性,保证数据一致性。阻塞等待,直到网络连接恢复,数据更新操作M完成之后,再给用户响应最新的数据V1

 

三.CAP之间取舍

接下来我们就谈一谈CAP,这三者之间是如何取舍的:

1CA without P

如果不要求P(不允许分区),则C(强一致性)和A(可用性)是可以保证的。但其实分区不是你想不想的问题,而是始终会存在,因此CA的系统更多的是允许分区后各子系统依然保持CA

常见模型例子:

单站点数据库;集群数据库等,网上找的还有:LDAP协议,xFS文件系统

实现方式:

两阶段提交;缓存验证协议

2CP without A

如果不要求A(可用),相当于每个请求都需要在节点之间强一致,而P(分区)会导致同步时间无限延长,如此CP也是可以保证的。很多传统的数据库分布式事务都属于这种模式,以及Zookeeper等中间件

常见模型例子:

分布式数据库;分布式锁;大部分的协议;Zookeeper

实现方式:

悲观锁;少数分区不可用

 

3AP wihtout C

要高可用并允许分区,则需放弃一致性。一旦分区发生,节点之间可能会失去联系,为了高可用,每个节点只能用本地数据提供服务,而这样会导致全局数据的不一致性。现在众多的NoSQL都属于。

     常见模型例子:

     Web缓存;DNSNoSQL

     实现方式:

     到期或者租赁;解决冲突;乐观锁

 

CAP的意义:

在系统架构时,应该根据具体的业务场景来权衡CAP,就拿大多数的门户网站来说,因为机器数量庞大,部署节点分散,网络故障时常态,可用性是必须要保证的,所以在设计的时候就会考虑舍弃一些一致性而选择AP模型。但是对于数据一致性较高的银行系统来说,可以用于系统临时不可用,但是数据必须要保持一致来说,选择CP模型无可厚非。

Guess you like

Origin www.cnblogs.com/lovegrace/p/11391842.html