Distributed from the theoretical advancement of ACID, CAP, BASE

Distributed from the theoretical advancement of ACID, CAP, BASE

​ Distributed is actually a single local integrated solution, which is not enough for business needs in terms of hardware or resources, and adopts a distributed multi-node solution that can expand resources. It studies how to divide a problem that requires a very large computing power into many small parts, then distribute these parts to multiple computers for processing, and finally combine these calculation results to get the final result.

​ So before we understand distributed, we should start from the one-piece structure.

1. From local affairs to distributed theory

Before understanding distributed, you need to understand a problem is "transaction"

Transactions provide a mechanism to incorporate all operations involved in an activity into an inseparable execution unit. All operations that make up a transaction can only be submitted when all operations can be executed normally. As long as any operation fails to execute, it will be Causes a rollback of the entire transaction.

Simply put, transactions provide an " all or nothing " mechanism.

2. ACID theory

​ Transactions are based on data operations, and it is necessary to ensure that transaction data is usually stored in the database. Therefore, when introducing transactions, we have to introduce the characteristics of database transactions, which refers to the abbreviation of the four basic characteristics of correct execution of database transactions ACID. Include:

  • Atomicity

  • Consistency

  • Isolation

  • Durability

(1) Atomicity

​ All operations in the entire transaction are either completed or not completed, and it is impossible to stagnate in a certain link in the middle.

For example: bank transfer, transfer 100 yuan from account A to account B:

A. Withdraw 100 yuan from account A

B. Deposit 100 yuan to account B. These two steps must be completed together, or not completed together. If only the first step is completed and the second step fails, the money will be inexplicably reduced by 100 yuan.

(2) Consistency

Before the transaction starts and after the transaction ends, the consistency constraints of the database data are not broken.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-J4qRLlPf-1689382298804)(images/128-distributed7.jpeg)]

For example: the existing integrity constraint A+B=100, if a transaction changes A, then B must be changed so that A+B=100 is still satisfied after the transaction ends, otherwise the transaction fails.

(3) Isolation

​ The database allows multiple concurrent transactions to read, write and modify data at the same time. If the data to be accessed by one transaction is being modified by another transaction, as long as the other transaction is not committed, the data it accesses will not be affected by the uncommitted transaction. Impact. Isolation can prevent data inconsistency caused by cross-execution when multiple transactions are executed concurrently.

For example: there is an existing transaction that transfers 100 yuan from account A to account B. If the transaction has not been completed, if B checks his account at this time, he will not see the newly added 100 yuan.

(4) Persistence (Durability)

​ After the transaction processing is over, the modification to the data is permanent, even if the system fails, it will not be lost.

​ Local transaction ACID can actually be summed up in a few words of "unified commit, failure rollback", which strictly guarantees the consistency of data in the same transaction!

Distributed transactions cannot achieve this ACID. Because of the constraints of CAP theory. Next, let's take a look at how the above characteristics are guaranteed in distributed, then there is a famous CAP theory.


3. CAP Theory

When designing a large-scale scalable network service, you will encounter three characteristics: scenarios where consistency, availability, and partition-tolerance are all required.

​The CAP law says that in a distributed computer system, the three guarantees of consistency, availability, and partition tolerance cannot be satisfied at the same time, and at most two can be satisfied.

​ As shown in the figure above, the three characteristics of CAP can only satisfy two at the same time. And in different pairwise combinations, there are also some mature distributed products.

Next, let's introduce the three characteristics of CAP. We use an application scenario to analyze the meaning of each characteristic in CAP.

The scene is divided into 5 processes as a whole:

Process 1. The client sends a request (such as: add an order, modify an order, delete an order)

Process 2. The Web business layer processes the business and modifies and stores it as data information

Process 3. Data synchronization between Master and Backup in the storage layer

Process 4. The web business layer retrieves data from the storage layer

Process 5. The web business layer returns data to the client

(1) Consistency

all nodes see the same data at the same time

Once the data update is completed and successfully returned to the client, the data of all nodes in the distributed system at the same time is completely consistent.

The consistency of CAP also includes levels such as strong consistency, weak consistency, and final consistency, which we will introduce in subsequent chapters later.

Consistency means that the read operation after the write operation can read the latest data state. When the data is distributed on multiple nodes, the data read from any node is the latest state.

Consistency achieves goals:
  • The web business layer successfully writes the database to the main Master, and reads data from the Backup successfully.

  • The web business layer fails to read the database from the master, and also fails to read data from the backup.

Necessary implementation process:

After writing to the master database, the slave database should be locked during the synchronization to the slave database, and the lock should be released after the synchronization is completed, so as to avoid querying the slave database for old data after the new data is successfully written.

Distributed consistency features:
  1. Due to the process of data synchronization, there will be a certain delay in the response of the write operation.
  2. In order to ensure data consistency, resources are temporarily locked, and the locked resources are released after data synchronization is completed.
  3. If the node that requests data synchronization fails, an error message will be returned, and the old data will not be returned.

(2) Availability

Reads and writes always succeed

Service is always available with normal response times.

The metrics for usability are as follows:

Availability Classification Available level (%) Tolerable downtime during the year
fault-tolerant availability 99.9999 <1 min
high availability 99.999 <5 min
Availability with failover capability 99.99 <53 min
high availability 99.9 <8.8h
Product availability 99 <43.8 min
Availability goals:
  • When the Master is being updated, the Backup database can respond to the data query results immediately after receiving the data query request.

  • The backup database does not allow response timeouts or response errors.

Necessary implementation process:
  1. After writing to the Master database, the data must be synchronized to the slave database.
  2. To ensure the availability of the Backup slave database, the resources in the Backup slave database cannot be locked.
  3. Even if the data has not been synchronized, the slave database must return the data to be queried, even if it is old data/or default data, but it cannot return an error or the response timed out.
Distributed availability features:

All requests are responded to without response timeouts or response errors.

(3) Partition tolerance (Partition tolerance)

the system continues to operate despite arbitrary message loss or failure of part of the system

In a distributed system, the system should continue to run despite any message loss or failure of some nodes.

Usually, each node of a distributed system is deployed on a different subnet, which is a network partition. It is inevitable that communication failures between nodes will occur due to network problems, and services can still be provided externally at this time.

Partition fault tolerance achieves goals:
  • Failure to synchronize data from the master database to the slave database does not affect read and write operations.

  • The failure of one node does not affect the service provided by the other node.

Necessary implementation process:
  1. Try to use asynchronous instead of synchronous operations, such as using asynchronous methods to synchronize data from the master database to the slave data, so that loose coupling can be effectively achieved between nodes.
  2. Add Backup from the database node, one of the Backup from the node hangs up other Backup from the node to provide services.
Partition fault tolerance features:

Partition tolerance is a basic capability of distributed systems.

4. CAP's "2 out of 3" proof

(1) Basic scene

In the summary, we mainly introduce why the theory of CAP cannot satisfy the three characteristics at the same time.

As shown in the figure above, it is the basic scenario for us to prove CAP. There are two nodes Host1 and Host2 in the distributed network, and the network between them can be connected. Host1 runs the Process1 program and the corresponding database Data, and Host2 runs the Process2 program and the corresponding database Data. .

(2) CAP features

如果满足一致性(C): Then Data(0) = Data(0).

如果满足可用性(A): Regardless of whether the user requests Host1 or Host2, the result will be responded immediately.

如果满足分区容错性(P): If one of Host1 or Host2 is out of the system (failure), it will not affect the normal operation of Host1 and Host2.

(3) The normal operation process of the distributed system

As shown in the figure above, it is the process of the normal operation of the distributed system.

A. The user Host1requests data update from the host, and the program Process1updates the database Data(0)asData(1)

B. The distributed system will synchronize the data, and the synchronized data Host1will be ``Data(0) Host2 Data(1)``Data(1)Host2,使中的数据也变为

C. When the user requests the host Host2, it Process2responds with the latest Data(1)data

According to the characteristics of CAP:

  • Host1Whether the data between the database and Host2the database is the same as the consistency©Data

  • The user's request for Host1and Host2response is Availability (A)

  • Host1and Host2between the respective network environments for partition tolerance§

It is currently a normal operation process. At present, the three characteristics of CAP can be satisfied at the same time, which is also one 理想状态. However, in actual application scenarios, errors are inevitable. If an error occurs, can CAP be satisfied at the same time, or how to choose?


(4) Abnormal operation process of distributed system

Assuming that the network between Host1and Host2is disconnected, we need to support this kind of network anomaly, which is equivalent to satisfying 分区容错性(P), can we satisfy 一致性(C)and at the same time 可用响应性(A)?

Assuming that when the network between N1 and N2 is disconnected,

A. The user Host1sends a data update request to, and Host1the data in that Data(0)will be updated asData(1)

B. Weakness is disconnected Host1from the network at this time, so the synchronization operation of the distributed system will fail, and the data in it is stillHost2Host2Data(0)

C. A user Host2sends a data read request to the website. Since the data has not been synchronized, Process2there is no way to return the latest data V1 to the user immediately, so there will be two choices.

First, sacrifice 数据一致性(c), responding to old data Data(0)to users;

Second, sacrifice 可用性(A), block and wait until the network connection is restored and the data synchronization is completed before responding to the user with the latest data Data(1).

This process proves that 分区容错性(p)the distributed system to be satisfied can only choose one of the two 一致性(C).可用性(A)

(5) The inevitability of "2 out of 3"

Through the CAP theory, we know that these three characteristics cannot be satisfied at the same time 一致性, 可用性and 分区容错性which one should be discarded?

CA waives P:

In a distributed system, it is impossible to not satisfy P, give up 分区容错性(p), that is, do not partition, and do not consider the problem of network failure or node hangup, then consistency and availability can be achieved. Then the system will not be a standard distributed system. Our most commonly used relational data satisfies CA.

There is no data synchronization between the master database and the slave database, the database can respond to each query request, and each query request can return the latest data through the transaction (atomic operation) isolation level.

Notice:

For a distributed system. P is a basic requirement. Among the three CAPs, you can only make a trade-off between the two CAs, and try your best to improve P.

CP abandons A

If a distributed system does not require strong availability, that is, if the system is allowed to be down or unresponsive for a long time, CP can be guaranteed among the three CAPs and A can be discarded.

Abandon availability and pursue consistency and partition fault tolerance, such as Redis, HBase, etc., and Zookeeper, which is commonly used in distributed systems, also chooses to give priority to guaranteeing CP among the three CAPs.

Scenes:

For inter-bank transfers, a transfer request must wait for the banking systems of both parties to complete the entire transaction before it is considered complete.

AP waived C

Abandon consistency in pursuit of partition tolerance and availability. This is the choice of many distributed system designs. To implement AP, the premise is that as long as the user can accept that the queried data is not up-to-date within a certain period of time.

Usually, the implementation of AP will guarantee the final consistency. The BASE theory mentioned later is extended based on AP.

scene 1:

Taobao order refund. If the refund is successful today, it will be credited to the account tomorrow, as long as the user can accept the credit within a certain period of time.

Scenario 2:

12306 for buying tickets. It is all about giving up consistency between availability and consistency and choosing availability.

You must have encountered this kind of scene when you bought a ticket in 12306. When you bought it, it reminded you that you have a ticket (but you may actually have no tickets), and you can enter the verification code normally and place an order. But after a while, the system prompts you that the order failed and the remaining tickets are insufficient. In fact, this is to ensure that the system can serve normally in terms of availability, and then make some sacrifices in terms of data consistency, which will affect some user experience, but it will not cause serious blockage of user processes.

However, we say that many websites sacrifice consistency and choose usability, which is actually not accurate. For example, in the example of buying tickets above, in fact, only strong consistency is discarded. The next best thing to ensure the final consistency. That is to say, although there may be data inconsistencies in the inventory of tickets at the moment of placing an order, but after a period of time, the final consistency must be guaranteed.

(6) Summary:

CA waives P: If P is not required (no partition allowed), then C (strong consistency) and A (availability) are guaranteed. In this way, the partition will never exist, so the CA system is more to allow each subsystem to still maintain CA after partitioning.

CP abandons A: If A (available) is not required, it means that each request needs to be strongly consistent between servers, and P (partition) will cause the synchronization time to prolong infinitely, so CP can also be guaranteed. Many traditional database distributed transactions belong to this mode.

AP abandons C: To be highly available and allow partitions, you need to give up consistency. Once a partition occurs, the nodes may lose contact. For high availability, each node can only provide services with local data, which will lead to inconsistency of global data. Many NoSQLs now fall into this category.

5. Thinking

Thinking: How to design an e-commerce system according to the CAP theory?

  • The core modules of the first e-commerce website include users, orders, products, payment, promotion management , etc.

1. For user modules, including login, personal settings, personal orders, shopping cart, favorites, etc., these modules guarantee AP, and short-term data inconsistencies will not affect the use.
2. The order module’s order payment deduction inventory operation is the core of the entire system, and CA needs to be guaranteed. In extreme cases, A guarantees C. 3. Commodity module’s product loading and unloading
and inventory management guarantee CP
4. The search function itself is It is not a module with very high real-time performance, so it is enough to ensure the AP.
5. The promotion is short-term data inconsistency, and the result is that the discount information cannot be seen, but the existing discounts must be available, and the discounts can be pre-calculated in advance, so AP can be guaranteed.
6. The payment is an independent system, or use third-party Alipay and WeChat. In fact, CAP is guaranteed by a third party. The payment system is a system with extremely high requirements for CAP. C must be guaranteed. A is relatively more important in AP. It cannot be that everyone cannot pay because of partitions.

6. Distributed BASE theory

​CAP cannot be satisfied at the same time, 分区容错性(P)but it is necessary for distributed systems. It would be great if the system could realize CAP at the same time, so the BASE theory appeared.

(1) BASE theory

general definition

BASE is an abbreviation for the three phrases Basically Available (basically available) , **Soft state (soft state) and Eventually consistent (final consistency).

BASE is the result of the trade-off between consistency and availability in CAP. It comes from the summary of the distributed practice of large-scale Internet systems. It is gradually evolved based on the CAP theorem . Its core idea is that even if strong consistency cannot be achieved, but Each application can adopt an appropriate method according to its own business characteristics to make the system achieve final consistency .

Two hedging concepts: ACID and BASE

ACIDIt is a design concept and model commonly used in traditional databases 追求强一致性.

BASESupport for large distributed systems is proposed via 牺牲强一致性Get 高可用性.

(2) Basically Available

It's actually two compromises.

  • Compromise on response time: Under normal circumstances, an online search engine needs to return corresponding query results to users within 0.5 seconds, but due to failures (such as power outages or network failures in some computer rooms of the system), the query results Response time increased to 1~2 seconds.

  • Compromise for loss of function: Under normal circumstances, when shopping on an e-commerce website (such as Taobao), consumers can complete almost every order smoothly. However, during some holiday shopping peaks (such as Double Eleven and Double Twelve), some consumers may be guided to A downgrade page.

(3) Soft state (soft state)

  • Atomicity (hard state) -> requires that the data copies of multiple nodes are consistent, which is a "hard state"

  • Soft state (weak state) -> Allow the data in the system to exist in an intermediate state, and consider that this state does not affect the overall availability of the system, that is, allow the system to have data delays in data copies of multiple different nodes.

(4) Eventually consistent (final consistency)

The above said soft state, and then it is impossible to be in a soft state all the time, there must be a time limit. After the deadline, all replicas should be guaranteed to maintain data consistency. In order to achieve the final consistency of the data. This time limit depends on factors such as network latency, system load, data replication scheme design, and so on.
[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-N52lK1o8-1689382298810)(images/145-Base3.jpeg)]

The slightly more official statement is:

The system can guarantee that in the absence of other new update operations, the data will eventually reach a consistent state, so all clients' data access to the system will eventually be able to obtain the latest value.

(5) BASE summary

In general, BASE theory is oriented to large-scale highly available and scalable distributed systems, which is the opposite of ACID of traditional transactions . It is completely different from ACID's strong consistency model, but obtains availability by sacrificing strong consistency , and allow the data to be inconsistent over time.

Guess you like

Origin blog.csdn.net/wang11876/article/details/131734749