Detailed explanation of CAP, ACID, BASE theory and NWR practice strategy

[ACID]
      ​​In traditional database systems, transactions have ACID 4 properties (Jim Gray discusses transactions at length in Transaction Processing: Concepts and Techniques).

  • Atomicity: A transaction is an atomic unit of operation in which all or none of the changes to data are performed.
  • Consistent: Data must remain in a consistent state at both the start and completion of a transaction. This means that all relevant data rules must be applied to the modification of the transaction to maintain data integrity; at the end of the transaction, all internal data structures (such as B-tree indexes or doubly linked lists) must also be correct.
  • Isolation: The database system provides a certain isolation mechanism to ensure that transactions are executed in an "independent" environment that is not affected by external concurrent operations. This means that the intermediate state in the transaction process is invisible to the outside world, and vice versa.
  • Durable: After the transaction is completed, its modifications to the data are permanent, even in the event of a system failure.

      For the transaction of a single node, the database guarantees the ACID characteristics of the transaction through concurrency control (two-phase locking - two phase locking or multi-version - multiversioning) and recovery mechanism (log technology). For distributed transactions across multiple nodes, the ACID of the transaction is guaranteed through a two-phase committing protocol.

       It can be said that the database system has developed rapidly with the needs of the financial industry. For the financial industry, neither availability nor performance is the most important, but consistency is the most important, users can tolerate system failures and stop services, but must not tolerate unreasonable reductions of money on accounts (of course, unreasonable increases are okay). And strongly consistent transactions are the fundamental guarantee of all this.


[Data Replication]
       Data replication belongs to the category of distributed computing. It is not limited to databases, but it mainly refers to the replication of distributed databases.

       In a distributed database system composed of multiple copies, the difference between its transaction characteristics and a single database system is mainly manifested in atomicity and consistency. In terms of atomicity, all operations of the same distributed transaction are required to be either committed or rolled back on all relevant replicas, that is, in addition to ensuring the atomicity of the original local transaction, it is also necessary to control the atomicity of the global transaction; in terms of consistency , the consistency of a single copy needs to be guaranteed between multiple copies.  Aiming at the core issues of the atomicity and consistency

       of distributed transactions, two replication protocols, after nearly 20 years of research, various replication protocols have been proposed. These protocols differ greatly in both external functions and internal implementations. Accordingly, we can classify and explain from these two major aspects.

       From the perspective of external functions, according to the literature [1], it can be classified from two aspects: the place and time of the execution of the transaction. From where the transaction is executed, it can be divided into two categories: the master-slave (Priamry/Copy) method and the Update-Anywhere method.

       The processing process of the former is generally that only one Primary node is designated in the system to accept the update request, and after the transaction operation is completed, the operation is broadcast to other Copy nodes before or after the transaction is committed.

       The latter process is slightly more complicated, any replica in the system has the same status, can receive Update request, and propagate the Update of each node to other replica nodes before or after transaction conflict detection and transaction commit.

       The primary/copy mode concurrency control is relatively simple, which can be realized by the local transaction control of the primary, and the realization of the atomicity of the transaction is relatively simple, which is generally realized by the primary node as the coordinating node. However, its defects are also obvious: only a single node provides the update request processing capability, and it is easy to form a single-point performance bottleneck for Update-intensive applications, such as OLTP. The Update-Anywhere method complements each other and can improve transaction throughput through multiple points, but it comes with complex concurrency control and atomicity issues between multiple distributed transactions.

       From the time point of transaction commit, it can be divided into two categories: active (Eager) and passive (Lazy). The difference is that the former propagates the update before the transaction commits, while the latter propagates the transaction operation to other replicas after the commit. In fact, the former is usually called synchronous replication, and the latter is called asynchronous replication.

       The advantage of asynchronous replication is that it can improve the response speed, but it sacrifices consistency. Generally, algorithms that implement this type of protocol need to add additional compensation mechanisms. The advantage of synchronous replication is that consistency can be guaranteed (generally through a two-phase commit protocol), but the overhead is large, the availability is not good (see the CAP section), and it brings more conflicts and deadlocks. It is worth mentioning that the replication protocol of Lazy + Primary/Copy is very practical in the actual production environment, and MySQL replication actually belongs to this.


[CAP ]
       At the PODC (Principles of Distributed Computing) conference in 2000, Brewer proposed the famous CAP theory. In 2002, Seth Gilbert and Nancy Lynch proved this theory. CAP refers to: Consistency, Availability and Partition Tolerance.  

  • Consistency: Consistency refers to the atomicity of data. This atomicity is guaranteed by transactions in classic databases. When the transaction is completed, whether it is successful or rolled back, the data will be in a consistent state . In a distributed environment, consistency refers to whether the data of multiple nodes is consistent.
  • Availability: Availability means that the service can always be guaranteed to be available. When the user sends a request, the service can return the result within a limited time.
  • Partition Tolerance: Partition refers to the partition of the network. It can be understood in this way that in general, key data and services are located in different IDCs.

       CAP 理论告诉我们,一个分布式系统不可能同时满足一致性,可用性和分区容错性这三个需求,三个要素中最多只能同时满足两点。三者不可兼顾,此所谓鱼与熊掌不可 兼得也!而对于分布式数据系统而言,分区容错性是基本要求,否则就不称其为分布式系统了。因此架构设计师不要把精力浪费在设计如何能同时满足三者的完美分 布式系统上,而是应该进行权衡取舍。这也意味着分布式系统的设计过程,也就是根据业务特点在C(一致性)和A(可用性)之间寻求平衡的过程,要求架构师真 正理解系统需求,把握业务特点。


【BASE】
        BASE 来自于互联网的电子商务领域的实践,它是基于 CAP 理论逐步演化而来,核心思想是即便不能达到强一致性(Strong consistency),但可以根据应用特点采用适当的方式来达到最终一致性(Eventual consistency)的效果。BASE 是 Basically Available、Soft state、Eventually consistent 三个词组的简写,是对 CAP 中 C & A 的延伸。BASE 的含义:  

  • Basically Available:基本可用;
  • Soft-state:软状态/柔性事务,即状态可以有一段时间的不同步;
  • Eventual consistency:最终一致性;

        BASE 是反 ACID 的,它完全不同于 ACID 模型,牺牲强一致性,获得基本可用性和柔性可靠性并要求达到最终一致性。

       CAP、BASE 理论是当前在互联网领域非常流行的 NoSQL 的理论基础。

 

http://my.oschina.net/moooofly/blog/113790

http://my.oschina.net/tantexian/blog/654112

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326973974&siteId=291194637