Distributed Transaction basis

This one focuses on the basics of distributed transactions, some basic algorithms, theorems, simple applications. Next article introduces specific practices of the Internet industry.

1, CAP Theorem

CAP theorem by the University of California, Berkeley professor Eric Brewer put forward, the core idea is based on any data sharing network, the system can only meet the most data consistency (Consistency), availability (Availability) and network partition tolerance (Partition Tolerance) two of the three properties, define three characteristics as follows:

  • Consistency (Consistency): equivalent to the latest versions of all the nodes have data to a relational database, updated data can be requested follow-up visits can see
  • Availability (Availability): high-availability data
  • Partition fault tolerance (Partition tolerance): tolerance between the network appears unreachable network partitions, partition

Specifically, in a distributed system, in any database design, a Web application only supports up to two attributes above. Obviously, any lateral expansion strategy must rely on the data partition. Therefore, designers must choose between consistency and availability.

Application Examples 1.1 CAP's

Here Insert Picture Description

  • CA (+ consistency availability) without P (fault tolerance)
    stand-alone and the Oracle Mysql;
    distributed cluster does not exist in this case, since a plurality of nodes must be considered standby synchronization, i.e. the network.

  • CP (consistency + fault tolerance) without A (availability)
    distributed database, such as Redis, HBase, Zookeeper
    any time on request ZooKeeper data consistent results can be obtained: When the master node of a network failure, the mechanism will election, the cluster election unavailable. But it can not guarantee the availability of each service request, ZooKeeper may drop some requests, consumers need to request the program to get results.

  • AP (Availability + fault tolerance) without C (consistency)
    12306 tickets
    purchased when you are prompted to vote (but may actually have no ticket), but after a while the system prompts you order failed, out of more than votes . In fact, just abandon the strong consistency. The next best thing to ensure that the eventual consistency.
    Eureka
    each node equal; there are nodes hang up, the other nodes will change immediately to ensure that services are available, but the information may be found not up to date. After the network is stable, the current instance of the new registration information will be synchronized to other nodes.
    Once the network problem occurs, you may lose the links between nodes. In order to ensure high availability, you need to be returned when the user immediately get access to global data led to inconsistencies.

2, BASE theory

In distributed systems, we often seek is available, it's important to be higher than the consistency of the program, then how to achieve high availability it? The former has given us to another theory, that BASE theory, it is used to CAP theorem for further expansion. BASE theory means:

  • Basically Available (Basic available)
  • Soft state (soft state)
  • Eventually consistent (eventual consistency)

BASE CAP theory is the result of consistency and availability of a trade-off, the core idea of ​​the theory is this: we can not do the same strong, but each application according to their operational characteristics, appropriate way to make the system reach The final consistency (Eventual consistency).

3, Paxos protocol

Paxos agreement in early 1990 proposed by Leslie Lamport, Leslie Lamport due Paxos widely used in the field of cloud computing has received a 2013 annual Turing Award.

As long as the Paxos protocol proposed system 2f + 1 nodes in the node f + 1 is available, the overall system can be used to ensure a strong and consistent data, which is a great improvement for the availability, still assuming the availability of a single node is P, then 2f + 1 nodes in any combination of f + 1 or more nodes normal availability of P (total) =, and assuming P = 0.99, f = 2, P (total) = 0.9999901494, from the availability of a single node the two raised to 5 9 9, which means that the system of downtime per year down from 87.6 hours 0.086 hours, which has to meet 99.99999999 percent of applications on Earth.

Leslie wrote two papers: "The Part-Time Parliament" and "Paxos Made Simple" a relatively complete exposition of the workflow and proof of Paxos, Paxos protocol each data write request once likened the proposal (proposal), each proposal has a separate number, the proposal will be forwarded to the submitter (proposer) to submit the proposal must 2f + 1 nodes in f + 1 nodes accepted to take effect, 2f + 1 nodes, called the proposal the committee vote (Quorum), a node called the committee vote Acceptor, Paxos protocol process also need to meet two constraints: a) Acceptor must accept the first proposal it receives; b) if a proposal is of great value v most Acceptor received, all subsequent proposal that accepted value v must also contain (v value can be understood as the content of the proposal, the proposal by one or more proposal No. composition and v).

Paxos protocol process is divided into two phases, the first is a proposal Proposer learn the latest state of the preparation phase; the second phase is an integral proposal based on the right to learn the state of submission stage, complete agreement as follows:

Stage 1

  1. Proposer select a proposal number n, then the number n transmits a request to prepare more than half Acceptors.

  2. If Acceptor receives a request to prepare a number of n, and n is greater than the number of all the requests it has prepare a response, then it will no longer be guaranteed by the proposal (accept) any number less than n, while it has passed through the largest proposal number (if any) as a response.

Phase 2

  1. If Proposer Acceptor received from more than half of its response to the prepare request (number n), then it will transmit for a number n, value of the value of v Acceptors, proposals to accept the request, where v is the close in response to the value of the maximum number of proposal, if the proposal does not contain a response, then it is an arbitrary value.

  2. If Acceptor receives a proposal for the number n accept the request, as long as it has not respond to the prepare request number is greater than n, it can by this proposal.

Paxos a timing chart to describe the protocol shown in Figure 3:
A timing chart of a flow of protocol Paxos
the above procedure Paxos protocol looks complicated, because the protocol to ensure completeness in many boundary conditions, such as the first test is empty, two Proposer submit proposals, etc., but the core Paxos protocol can be simply described as: proposer Acceptor where most start learning the latest contents of the proposal, and then form a new proposal submitted in accordance with the highest number of proposals to the learning content, if the proposal was voted the most Acceptor it means the proposal has been adopted. Because learning proposal and the proposal by the Acceptor collection of more than a half, it will be able to learn the value of the latest proposal, adopted, Acceptor collection twice a proposal by also there must be a common Acceptor, the public at the time to meet the constraints b Acceptor ensure the consistency of data, the so Paxos protocol is also known as the majority agreement.

Paxos true greatness lies in its protocol simplicity, Paxos protocol can process any messages are missing, ensure consistency does not depend on the success of a particular messaging, which greatly simplifies the design of distributed systems , match closely distributed network environment may partitioning features, compared to the "two-phase commit (2PC)" before Paxos agreement also ensures data consistency and strong, but the complexity is very high and is dependent on the availability of a single coordinator.

That being Paxos so powerful, why there will be ZAB agreement?

4, ZAB agreement

Although Paxos protocol is complete, but it should apply to the actual distributed system has some problems to be solved:

  • 在多个Proposer的场景下,Paxos不保证先提交的提案先被接受,实际应用中要保证多提案被接受的先后顺序怎么办?

  • Paxos允许多个Proposer提交提案,那有可能出现活锁问题,出现场景是这样的:提案n在第二阶段还没有完成时,新的提案n+1的第一阶段prepare请求到达Acceptor,按协议规定Acceptor将响应新提案的prepare请求并保证不会接受小于n+1的任何请求,这可能导致提案n将不会被通过,同样在n+1提案未完成第二阶段时,假如提案n的提交者又提交了n+2提案,这可能导致n+1提案也无法通过。

  • Paxos协议规定提案的值v只要被大多数Acceptor接受过,后续的所有提案不能修改值v,那现实情况下我还要修改v值怎么办?

ZooKeeper的核心算法ZAB通过一个简单的约束解决了前2个问题:所有提案都转发到唯一的Leader(通过Leader选举算法从Acceptor中选出来的)来提交,由Leader来保证多个提案之间的先后顺序,同时也避免了多Proposer引发的活锁问题。

ZAB协议的过程用时序图描述如图4所示,相比Paxos协议省略了Prepare阶段,因为Leader本身就有提案的最新状态,不需要有提案内容学习的过程,图中的Follower对应Paxos协议中的Acceptor,Observer对应Paxos中的Learner。
Work process ZAB agreement
ZAB引入Leader后也会带来一个新问题: Leader宕机了怎么办?其解决方案是选举出一个新的Leader,选举Leader的过程也是一个Paxos提案决议过程,这里不展开讨论。

那如何做到提案的值v可以修改呢?这不是ZAB协议的范畴,研究ZooKeeper源码后发现它是这么做的:ZooKeeper提供了一个znode的概念,znode可以被修改,ZooKeeper对每个znode都记录了一个自增且连续的版本号,对znode的任何修改操作(create/set/setAcl)都会促发一次Paxos多数派投票过程,投票通过后znode版本号加1,这相当于用znode不同版本的多次Paxos协议来破除单次Paxos协议无法修改提案值的限制。

从保证一致性的算法核心角度看ZAB确实是借鉴了Paxos的多数派思想,但它提供的全局时序保证以及ZooKeeper提供给用户可修改的znode才让Paxos在开源界大放异彩,所以ZAB的价值不仅仅是提供了Paxos算法的优化实现,也难怪ZAB的作者一直强调ZAB和Paxos是不一样的算法。

CAP理论告诉我们在分布式环境下网络分区无法避免,需要去权衡选择数据的一致性和可用性,Paxos协议提出了一种极其简单的算法在保障数据一致性时最大限度的优化了可用性,ZooKeeper的ZAB协议把Paxos更加简化,并提供全局时序保证,使得Paxos能够广泛应用到工业场景。

5、两阶段提交协议(2PC)

两阶段提交协议(Two-phase Commit Protocol,简称 2PC),是分布式事务的核心协议。在此协议中,一个事务管理器(Transaction Manager,简称 TM)协调 1 个或多个资源管理器(Resource Manager,简称 RM)的活动,所有资源管理器向事务管理器汇报自身活动状态,由事务管理器根据各资源管理器汇报的状态(完成准备或准备失败)来决定各资源管理器是“提交”事务还是进行“回滚”操作。

二阶段提交的具体流程如下:

  1. 应用程序向事务管理器提交请求,发起分布式事务;
  2. 在第一阶段,事务管理器联络所有资源管理器,通知它们准备提交事务;
  3. 各资源管理器返回完成准备(或准备失败)的消息给事务管理器(响应超时算作失败);
  4. 在第二阶段:
    1. 如果所有资源管理器均完成准备(如图 1),则事务管理器会通知所有资源管理器执行事务提交;
    2. 如果任一资源管理器准备失败(如图 2 中的资源管理器 B),则事务管理器会通知所有资源管理器进行事务回滚。

Here Insert Picture Description
Here Insert Picture Description

6、TCC模型

Try-Confirm-Cancel(TCC)是初步操作(Try)、确认操作(Confirm)和取消操作(Cancel)三种操作的缩写,这三种操作的业务含义如下:

  • Try 阶段:对业务系统做检测及资源预留;
  • Confirm 阶段:对业务系统做确认提交。默认 Confirm 阶段是不会出错的,只要 Try 成功,Confirm 一定成功;
  • Cancel 阶段:当业务执行出现错误,需要回滚的状态下,执行业务取消,释放预留资源。

TCC 是二阶段提交协议(Two-phase Commit Protocol,简称 2PC)的扩展,Try 操作对应 2PC 中一阶段的准备提交事务(Prepare),Confirm 对应 2PC 中二阶段事务提交(Commit),Cancel 对应 2PC 中二阶段事务回滚(Rollback)。

与 2PC 不同的是,TCC 是一种编程模型,是应用层的 2PC;TCC 的 3 个操作均由编码实现,通过编码实现了 2PC 资源管理器的功能。

TCC 自编码的特性决定 TCC 资源管理器可以跨数据库、跨应用实现资源管理,将对不同的数据库访问、不同的业务操作通过编码方式转换一个原子操作,解决了复杂业务场景下的事务问题。同时 TCC 的每一个操作对于数据库来讲都是一个本地数据库事务,操作结束则本地数据库事务结束,数据库的资源也就被释放;这就规避了数据库层面的 2PC 对资源占用导致的性能低下问题。

7、柔性事务

7.1 define flexible Affairs

Rigid transaction (such as a single database) fully compliant with ACID norms, namely the four basic elements of database transactions executed correctly:

  • Atomicity (Atomicity)
  • Consistency (Consistency)
  • Isolation (Isolation)
  • Persistent (Durability)

Flexible transaction (such as distributed transactions) In order to meet the needs of availability, performance and downgrade services, reduce consistency (Consistency) requirements and isolation (Isolation), following the theory BASE:

  • The availability of basic services (Basic Availability)
  • Flexible state (Soft state)
  • The final consistency (Eventual consistency)

Similarly, the flexible part of the transaction also follows the ACID specification:

  • Atomic: strictly follow
  • Consistency: consistency after the transaction is completed strictly follow; consistency of affairs may be relaxed
  • Isolation: between parallel transactions can not be affected; intermediate result of the transaction to allow secure visibility relaxed
  • Persistence: strictly follow

7.2 Classification flexible Affairs

Flexible transaction is divided into: a two-stage type, compensation type, type asynchronous ensure maximum effort to notify type.

  • Two-phase type
    distributed transaction two-phase commit protocol, the corresponding technical XA, JTA / JTS, which is a typical model of a distributed transaction processing environment.

  • Compensation
    TCC type Affairs (Try-Confirm-Cancel) can be classified as compensation. In case of success Try, if you want to roll back the transaction, Cancel will serve as a compensation mechanism, the rollback Try operation; TCC each operation transaction localization and submission (no two-phase constraints) as soon as possible; when the global transaction rollback requirement, realization of "compensation" behavior by another local affairs.
    TCC is a two-phase commit protocol resource conversion layer to the business layer and become part of the business model.

  • Type asynchronous ensure
    some transactional operations have become synchronization conflicts asynchronous operation, avoid contention for database transactions, such as message transaction mechanism.

  • Best efforts to notify the type
    carried out by the notification server (message notification), allowed to fail, there are complementary mechanisms.

Guess you like

Origin blog.csdn.net/zhaohong_bo/article/details/90445307