Basic Theory of Distributed Transactions

What is Distributed Transaction

Distributed Transaction participants of the transaction, transactional servers, server and transaction manager resource are located on different nodes of different distributed systems.

Simply put, is a major operation by the composition of different small operation, these small operations distributed across different servers, and belong to different applications, distributed transactions need to ensure that these small operations either all succeed, or all fail.

In essence, a distributed transaction is to ensure the consistency of data from different databases.

The reason distributed transaction generated

From the above local affairs, we can be divided into two:

  • Service generates a plurality of nodes
  • Generating a plurality of nodes Resource

Service plurality of nodes

With the rapid development of Internet, micro-services, SOA and other services architecture model is being large-scale use.

Here is a simple example, a company within the user's assets may be divided into a good number of parts, such as balance, points, coupons and so on.

Within the company may have integral function is maintained by a team of micro-services, coupons is another team to maintain.
Here Insert Picture Description
In this case we can not guarantee the points deduction after deduction of the coupon can be successful.

Resource plurality of nodes

Similarly, the development of the Internet so fast, our general equipment *** MySQL data must be sub-library sub-table.

For a Alipay transfer business, you transfer money to a friend, chances are that your database is in Beijing, and your friend's money is the presence of Shanghai, so we still can not guarantee that they can succeed at the same time.
Here Insert Picture Description

Distributed transaction basis

From the above point of view is distributed transaction with the rapid development of the Internet came into being, it is a necessity.

We said database before the four ACID properties, has been unable to meet our distributed transaction, this time there are some new big brother put forward some new theories.


CAP

CAP theorem, also known as Brewer theorem. For the design of distributed systems (not just distributed transactions) of the architect, CAP is your introductory theory.

Consistency (Consistency) : for a given client, the read operation can return the write operation.

For data distributed on different nodes, if a node in the updated data, then the other node if the data can be read into this, then called a strong agreement, if there is a node does not read that is distributed inconsistent.

Availability (availability) : the non-defective node returned in response to a reasonable (and not an error response timeout) within a reasonable time. Two key is the availability of a reasonable time is a reasonable response.

Means a reasonable time request can not be blocked, should be given a reasonable time to return. Reasonable response refers to a system should be clear returns a result and the result is correct, right here refers to such should return 500 instead of returning 400.

Partition tolerance (partitions fault tolerance) : When the network partition occurs, the system can continue to work. Figuratively, this clustering multiple machines, machines have network problems, but the cluster can still work properly.

CAP familiar people know, there are not three, if interested can search for proof of the CAP, in a distributed system, the network can not be 100% reliable, in fact, partition is an inevitable phenomenon.

If we choose to give up the CA P, then when the partition occurs, in order to ensure consistency, this time must reject the request, but A does not allow, the distributed system is theoretically impossible to choose CA architecture, you can only choose CP or AP architecture.

For CP, the availability to give up the pursuit of consistency and partitions fault tolerance, and strong ZooKeeper is actually consistent with our pursuit.

For the AP, the giving up consistency (here say consistency is the strong consistency), the pursuit of partitions fault tolerance and availability, this is the choice of many distributed system design, followed by the BASE also be expanded according to AP.

By the way, CAP is to ignore the theory of network latency, that is, when the transaction commits, copied from node A to node B without delay, but in reality this is clearly impossible, so there is always a certain amount of time is inconsistent.

At the same time the CAP select two, for example, you select the CP, not tell you to give up A. Because the probability of occurrence of P is too small, most of the time you still need to ensure CA.

Even if you have to partition appeared to prepare for the later A, for example by means of a number of logs, it is available to respond to other machines.


BASE

BASE is Basically Available (available basic), abbreviations three phrases Soft state (soft state) and Eventually consistent (final consistency), the CAP is an extension of the AP.

Basically Available (Basic available) : the distributed system in case of failure, allowing the loss of part of the available functions, ensure that the core functions are available.

Soft state (soft-state) : the system allows the presence of an intermediate state, this does not affect system availability status, referred to herein is inconsistent in the CAP.

Eventually consistent (eventual consistency) : final agreement means that over time, all the nodes will all have the same data.

BASE CAP in theory does not solve the network latency, soft state and ultimately consistent with BASE ensure the consistency of the delay.

ACID and BASE is reversed, it is completely different model ACID strong consistency, but availability is obtained by sacrificing a strong consistency and allows data to be inconsistent over time, but eventually reaches a consistent state.

Published 107 original articles · won praise 88 · views 260 000 +

Guess you like

Origin blog.csdn.net/Code_shadow/article/details/100069189