Distributed theory (a) CAP theory
One. CAP theory Introduction
CAP principle, also known as CAP theory, the main idea is in any distributed system can not be met simultaneously CAP .
C ( Consistency ): indicates that consistency, all the nodes at the same time see the same data.
A ( Avaliablity ): indicates availability of, whether or not successful, to ensure that a request can receive a response.
P ( Partion Tolerance ): partitioning fault tolerance, the system any partition, when a network failure, can still operate.
As described above, as Gilbert believes that consistency is actually a relational database talked about the ACID , a user request either a success or a failure, and can not have in an intermediate state; once a transaction is complete, all transactions must be based on the future after the completion of this state; incomplete transactions do not affect each other; once a transaction is completed, that is, the precepts. Availability in fact, for a system, all requests should be "success" and receive a "response." Partition fault tolerance actually refers to the fault-tolerant distributed system, one node fails, it does not affect the normal use of the entire cluster.
two. CAP Introduction to the Theory
As shown, in a network, Nl and N2 that is two nodes in a distributed system, they are shared data block V , which is a value of V0 .
l when satisfied consistency, A of V0 should be and B is V0 consistent, i.e. V0 = V0
l when meet availability, both requesting access to A or B should get a response.
l When satisfied the availability of the partition, A and B just a case where the network is down or the occurrence of nowhere, should not affect the overall system availability.
If the above-described A program update V0 value Vl , and then update the B copies of Vl , when a request for access to B , the result is Vl . As shown below:
但是分布式系统中,有些时候这些并不能按照你想的这样进行,在分布式系统中通常情况下网络是分区的,如果出现了网络延迟,导致N1上更新的消息无法到达N2上,即N2上的数据副本依然是V0,当一个请求访问B时,获取到的结果是V0,而访问A时,获取到的结果是V1,这就导致了在用户看来是同一个请求,得到的结果是不一样的。如下图
在这个时候,方案的设计者就应该在这里做出两种选择:
(1)牺牲数据一致性,保证可用性。响应旧的数据V0给用户。
(2)牺牲可用性,保证数据一致性。阻塞等待,直到网络连接恢复,数据更新操作M完成之后,再给用户响应最新的数据V1
三.CAP之间取舍
接下来我们就谈一谈CAP,这三者之间是如何取舍的:
(1)CA without P
如果不要求P(不允许分区),则C(强一致性)和A(可用性)是可以保证的。但其实分区不是你想不想的问题,而是始终会存在,因此CA的系统更多的是允许分区后各子系统依然保持CA。
常见模型例子:
单站点数据库;集群数据库等,网上找的还有:LDAP协议,xFS文件系统
实现方式:
两阶段提交;缓存验证协议
(2) CP without A
如果不要求A(可用),相当于每个请求都需要在节点之间强一致,而P(分区)会导致同步时间无限延长,如此CP也是可以保证的。很多传统的数据库分布式事务都属于这种模式,以及Zookeeper等中间件
常见模型例子:
分布式数据库;分布式锁;大部分的协议;Zookeeper
实现方式:
悲观锁;少数分区不可用
(3) AP wihtout C
要高可用并允许分区,则需放弃一致性。一旦分区发生,节点之间可能会失去联系,为了高可用,每个节点只能用本地数据提供服务,而这样会导致全局数据的不一致性。现在众多的NoSQL都属于。
常见模型例子:
Web缓存;DNS;NoSQL;
实现方式:
到期或者租赁;解决冲突;乐观锁
CAP的意义:
在系统架构时,应该根据具体的业务场景来权衡CAP,就拿大多数的门户网站来说,因为机器数量庞大,部署节点分散,网络故障时常态,可用性是必须要保证的,所以在设计的时候就会考虑舍弃一些一致性而选择AP模型。但是对于数据一致性较高的银行系统来说,可以用于系统临时不可用,但是数据必须要保持一致来说,选择CP模型无可厚非。