Distributed Development (1) --- CAP Theory

Traditional database (relational database) design follows ACID rules, and distributed system design also has a corresponding CAP theory.

1. ACID

ACID refers to the acronym for the four basic elements of the correct execution of database transactions. Including: Atomicity, Consistency, Isolation, Durability. A database that supports transactions (Transaction) must have these four characteristics, otherwise the correctness of data cannot be guaranteed in the transaction process (Transaction processing).

1. Atomicity

Atomicity means that all operations in a transaction are either completed or not completed, and will not end in a certain link in the middle. If an error occurs during the execution of the transaction, it will be restored (Rollback) to the state before the transaction started, as if the transaction had never been executed.

example

For example, bank transfer, deduction of money from account A, and addition of money to account B, these two operations must be performed at the same time. Transfers consist of two steps:

A:1000 - 200 = 800
B:300 + 200 = 500

Atomicity means that the two steps succeed together, or fail together, and that only one of the actions cannot occur. Otherwise, there will be a situation where the accounts do not match.


2. Consistency

Consistency means that a transaction can encapsulate state changes (unless it is a read-only one). Transactions must always keep the system in a consistent state, no matter how many concurrent transactions there are at any given time.

In layman's terms, after the transaction is executed, it must be transferred from one consistent state to another . The integrity of the database is not compromised before the transaction begins and after the transaction ends.

Example
Let's take the bank transfer scenario as an example.
Assuming there are five accounts, each account balance is 100 yuan, then the total of the five accounts is 500 yuan, if there are multiple transfers between these 5 accounts at the same time, no matter how many are concurrent, such as between A and B accounts Transfer 5 yuan, transfer 10 yuan between C and D accounts, transfer 15 yuan between B and E, the total of the five accounts should still be 500 yuan.


3. Isolation

Isolation refers to the ability of the database to allow multiple concurrent transactions to read, write and modify its data at the same time. Isolation can prevent data inconsistency caused by cross-execution when multiple transactions are executed concurrently.

Isolation means that multiple concurrent processes are invisible and isolated from each other without being disturbed. If isolation is not considered, the following problems may arise:

  • Dirty read : Transaction T1 reads the uncommitted data of transaction T2. ​​As a result, transaction T2 is rolled back, and T1 gets dirty data.
  • Non-repeatable read : After transaction T1 reads the data, transaction T2 immediately updates the data, and when transaction T1 reads it again, it finds that the data is inconsistent.
  • Phantom reading : This generally occurs when a large batch of modifications is made. For example, transaction T1 modifies all data from 1 to 2. During the modification process, transaction T2 inserts a new data 1. Finally, check the data and find that there is a piece of data that has not been modified.

In response to the above problems, the database provides four transaction isolation levels: Read uncommitted, Read committed, Repeatable read, and Serializable from low to high. These four levels can solve dirty reads, non-repeatable reads, and phantom reads one by one. class questions.

                                                 √ means it will happen, × means it will not happen

  dirty read non-repeatable read Phantom reading
Read uncommitted (read uncommitted)
Read committed (read committed) ×
Repeatable read (repeatable read) × ×
Serializable × × ×

Among them, Read uncommitted is the lowest level, which cannot be guaranteed under any circumstances.


4. Durability

Persistence means that after the transaction is completed, the changes made by the transaction to the database will be permanently saved in the database and will not be rolled back.
In layman's terms, after the transaction is committed, the saved result will remain unchanged. Even if the database goes down it shouldn't have any effect on it.

 

2. CAP

1. Overview of CAP

In 1998, Eric Brewer, a computer scientist at the University of California, proposed that distributed systems have three indicators: Consistency, Availability, and Partition tolerance, and their first letters are C, A, and P.
Eric Brewer said that these three indicators can not be achieved at the same time. This conclusion is called the CAP theorem.

The CAP principle, also known as the CAP theorem, refers to Consistency, Availability, and Partition tolerance in a distributed system . The CAP principle means that these three elements can only achieve at most two points at the same time, and it is impossible to take care of all three.

 

Consistency

Consistency refers to "all nodes see the same data at the same time", that is, after the update operation is successful and returned to the client, the data of all nodes at the same time is completely consistent, which is distributed consistency.

Consistency in distributed includes strong consistency and weak consistency . Strong consistency means that the data seen by any node at any time is the same; weak consistency is generally implemented as final consistency, which may exist at the beginning. variance, but over time, the final data remains consistent.

Availability

Availability refers to "Reads and writes always succeed", that is, the service is always available and the response time is normal.

Partition tolerance

Partition fault tolerance refers to "the system continues to operate despite arbitrary message loss or failure of part of the system", that is, when a distributed system encounters a node or network partition failure, it can still provide external services that satisfy consistency and availability .

In layman's terms, when the network nodes cannot communicate with each other, the nodes are isolated and a network partition occurs, but the entire system can still work.

2. CAP’s trade-off strategy

The three characteristics of CAP can only satisfy two of them, so there are three strategies to choose:

CA without P: If P (partitioning is not allowed), then C (strong consistency) and A (availability) can be guaranteed . But giving up P also means giving up the scalability of the system, that is, the distributed nodes are limited, and there is no way to deploy child nodes, which is contrary to the original intention of the distributed system design. Traditional relational database RDBMS: Oracle, MySQL are CA.

So, for a distributed system. P is a basic requirement. Among the three CAPs, only CA can make a trade-off .

CP without A: If A (available) is not required, it means that each request needs to maintain strong consistency between servers, and P (partition) will cause the synchronization time to be extended indefinitely (that is, wait for the data to be synchronized before normal access to the service) , Once a network failure or message loss occurs, it is necessary to sacrifice the user experience and wait for all the data to be consistent before allowing the user to access the system. There are actually many systems designed as CP, the most typical ones are distributed databases, such as Redis, HBase, etc. For these distributed databases, data consistency is the most basic requirement, because if even this standard cannot be met, then it is better to directly use relational databases, and there is no need to waste resources to deploy distributed databases.

 AP wihtout C: To be highly available and allow partitions, you need to give up consistency. Once a partition occurs, the nodes may lose contact. For high availability, each node can only provide services with local data, which will lead to inconsistency of global data. A typical application is like Mimi’s panic-buying mobile phone scene. Maybe a few seconds ago when you browsed the product, the page indicated that it was in stock. . In fact, this is to ensure that the system can serve normally in terms of A (availability), and then make some sacrifices in terms of data consistency. Although it will affect some user experience to some extent, it will not cause serious congestion in the user shopping process.

Which is better, there is no conclusion, it can only be decided according to the scene, the one that suits is the best.

3. BASE theory

Dan Pritchett, the architect of eBay, originated from the practical summary of large-scale distributed systems. He published an article on ACM and proposed the BASE theory. The BASE theory is an extension of the CAP theory. The core idea is that even if it cannot achieve strong consistency (Strong Consistency, The consistency of CAP is strong consistency), but the application can achieve eventual consistency (Eventual Consitency) in a suitable way.

BASE refers to Basic Available, Soft State, and Eventual Consistency.

 

 

 

Guess you like

Origin blog.csdn.net/icanlove/article/details/117511483