Distributed transactions and the thinking behind CAP theory and BASE theory

Distributed transaction

In a stand-alone database, we can easily implement a transaction processing system that meets the ACID characteristics. In a distributed database, data is scattered on different machines. How to perform distributed transaction processing on these data is very important The pick.

Distributed transaction means that transaction participants, transaction-supporting machines, resource servers, and transaction managers are located on different nodes of the distributed system. Usually, a distributed transaction involves operations on multiple data sources or business systems.

Imagine a typical distributed transaction scenario:

Example: A cross-bank transfer operation design calls two remote bank services, one is the withdrawal service provided by the local bank, and the other is the deposit service provided by the target bank. These two services are stateless and independent of each other. , Together constitute a complete distributed service. If the withdrawal from the local bank is successful, but the deposit service fails for some reason, you must roll back to the state before the withdrawal for a long time, otherwise the user may find that his money is missing.

Analysis: It can be understood from the above example that a distributed transaction can be regarded as a sequence of multiple distributed operations, such as the withdrawal service and deposit service in the above example. This series of distributed operation sequences can usually be called For sub-affairs . Therefore, distributed transactions can also be defined as a nested transaction .

At the same time, it also has ACID transaction characteristics.

However, because in distributed transactions, the execution of each sub-transaction is distributed, it is extremely complicated to implement a distributed transaction processing system that can guarantee ACID characteristics, especially for a high-traffic, high-concurrency Internet For distributed systems,

If we expect to design a set of distributed transactions that strictly meet the ACID characteristics, it is likely that there will be a conflict between the availability of the system and strict consistency- because when we require the distributed system to have strict consistency , It is likely to need to sacrifice system availability. But there is no doubt that usability is a system attribute that consumers do not allow us to bargain. For example, online shopping websites like Taobao require 7*24 hours of uninterrupted external service, and consistency is even more important. All consumers just need a piece of software. Therefore, there can never be a best of both worlds between availability and consistency to solve the problems of consistency and availability. So how to build a distributed system that takes into account both availability and consistency has actually become a difficult problem for countless developers to discuss. So, here comes the classic theories of distributed systems such as CAP and BASE.

 

CAP

What is the CAP theorem?

In fact, in July 2000, Professor Eric Brewer from the University of California Berkeley proposed the CAP conjecture. Two years later, Seth Gilbert and Nancy Lynch from MIT proved the possibility of the conjecture theoretically. Since then, The CAP theorem has officially become the workers' theorem in the field of distributed computing academically. And it has deeply affected the development of distributed computing.

CAP theory tells us that it is impossible for a distributed system to meet the three basic requirements of consistency (C: Consistency): (A: Availability) and partition tolerance (P: Partition tolerance) at the same time. At most, only one of them can be met at the same time. Two.

characteristic

description

C:Consistency

Consistency: In a distributed environment, data can maintain consistency (strict consistency) among multiple copies. Under the requirement of consistency, when a system is in a consistent state of data, after performing an update operation, it should ensure that the data of the system is still in a consistent state

A:Availability

Availability: The service provided by the system must always be available, and a correct response can be obtained for each request—but there is no guarantee that the data obtained is the latest data

P:Partition tolerance

Partition fault tolerance, when the distributed system encounters any network partition failure, it can still provide external services that meet the consistency and availability, unless the entire network environment fails

  • Why can only choose 2 from 3?

First of all, can the three conditions be met at the same time?

As shown below:

The whole system is composed of two nodes cooperated with each other through network communication for data transmission. When node A is updating the database, it needs to update the database of node B's DB at the same time (this is an atomic operation).

How does the above system satisfy cap? C: When node A is updated, node B must be updated. A: Both nodes must be available. P: When node A/B has a node failure or network partition, it must be available to the outside world.

In summary of the points mentioned above, it is impossible to fully consider CAP, because as long as there is a network partition, C cannot be satisfied. Because node A cannot connect to node B at all. If consistency is to be enforced, A cannot be guaranteed at this time, and service B must be stopped, thus giving up availability.

So only two conditions can be met at most

Combination method

Analysis result

THAT

Meet the consistency and availability, but give up the partition fault tolerance. To put it bluntly, it is an overall application. If you want to avoid the problem of partition fault tolerance in the system, a simpler approach is to put all the data (or just those related to the practice) on a distributed node on. Although this does not guarantee 100% that the system will not go wrong, at least it will not encounter the negative impact of network partitions. But at the same time it should be noted that giving up P also means that the scalability of the system is improved.

CP

Satisfy consistency and partition fault tolerance. Once the system encounters network partitions or other failures or in order to ensure consistency, it gives up availability, then the affected services need to wait for a certain period of time, so the system cannot provide normal services to the outside during the waiting period , Which is not available

AP

To meet availability and partition fault tolerance, when network partitions occur while ensuring availability, the nodes must continue to serve externally, which will inevitably lead to loss of consistency, and data consistency is not completely required. If this is the case, then system data It doesn't make sense anymore. This actually refers to giving up the strong consistency of the data and retaining the final consistency of the data. Such a system cannot guarantee real-time consistency of data, but it can promise that the data will eventually reach a consistent state. This introduces the concept of a time window. How long it takes to achieve consistency depends on the design of the system, including the length of time for data copies to be replicated between different nodes.

It can be seen from the CAP theorem that a distributed system cannot satisfy the three basic requirements of C/A/P at the same time. Only two of them can be met at most, and one thing to be clear is that for a distributed, partition fault tolerance can be said to be the most basic requirement. Since it is a distributed system, the components in the distributed system must be deployed to different nodes, otherwise there will be no so-called distributed, so sub-networks must appear. For distributed systems, network problems are an abnormal situation that will inevitably occur, so partition fault tolerance has become a problem that a distributed system must face and solve. Therefore, we need to spend our energy on finding a balance between C and A according to our business scenarios.

  • Can you solve the problem of choosing 2 from 3?

To solve the problem of choosing 2 from 3, the first thing we need to think about is whether the partition appears 100%? If there is no partition, then the CAP can be satisfied at the same time for a long time. If there is a partition, it can be adjusted according to the strategy. For example, C does not need to use such strong consistency, and can store the data first to achieve the final "consistency"

Based on this idea, it leads to the BASE theory

Base theory

In the CAP theory, it is introduced that it is impossible to meet at the same time, and partition fault tolerance is necessary for distributed systems. But if the system can achieve CAP at the same time is the best, so the Base theory is proposed.

What is Base theory?

BASE full process: Basically Available, Soft state, and Eventually consistent, the abbreviation of three phrases, proposed by the architect of eBay. The Base theory is the result of the balance between consistency and usability in Cap. It is derived from the summary of large-scale Internet distributed practices and evolved based on the CAP theorem.

The core idea is: Since strong consistency cannot be achieved, each application can adopt an appropriate method to achieve the ultimate consistency of the system according to its own business characteristics.

 

  • Basicaully Available

What is basically available?

Basically available means that the distributed system is allowed to lose part of its availability in the event of unpredictable failures—but please note that this is by no means equivalent to unavailability of the system.

 

E.g:

1. Loss in response time: Under normal circumstances, an online search engine needs to return the user's corresponding query results within 0.5s, but due to a failure (for example: part of the system room has a power failure or a network failure), the query results The corresponding time increased to 1-2 seconds. For example, some machines have higher configurations, higher performance, and faster running speeds. Some machines have lower configurations, but higher weights have higher weights and lower configurations. Under normal circumstances, the higher configuration will be given priority, but at this time the machine with the higher configuration fails. At this time, the problem of longer response time will inevitably occur. But for the system, it is still available

 

2. Loss of function: Under normal circumstances, in an e-commerce website, Jingdong, Taobao, pdd, etc., consumers have the opportunity to successfully complete every order. However, during the peak shopping seasons during some festivals, such as Double 11, due to the surge in consumer demand for shopping, in order to protect the stability of the system or ensure consistency, some consumers may be directed to another downgrade page.

 

  • Soft state:

What is a soft state? Compared with consistency, data copies of multiple nodes are required to be consistent, which is a "hard state". What is a soft state?

Soft state refers to: allowing the data in the system to have an intermediate state, and that the state does not affect the overall availability of the system, that is, allowing the system to delay the process of data synchronization between multiple data copies of different nodes.

  • Eventually consistent

Ultimate consistency emphasizes that all data copies in the system can finally reach a consistent state after a period of synchronization. Therefore, the essence of final consistency is that the system needs to ensure that the final data can be consistent, but does not need to ensure strong consistency of system data.

 

In actual engineering practice, there are five main variants of final consistency:

1. Causal consistency

This means that if node A informs node B after updating certain data, then node B's subsequent access and modification of the data are based on A's updated value. At the same time, the data of node C that has no causal relationship with node A does not have such restrictions.

2. Read your writes

After node A updates a piece of data, it can always access the latest value updated by itself without seeing the old value. In other words, for a single data acquisition, the data it reads must not be older than the value written last time. Therefore, what has been read and written can also be regarded as a special causal consistency.

3. Session consistency

Session consistency frames the access process of system data in a session: the system can ensure that the "read known and written" consistency is achieved in the same valid session, that is, after the update operation is performed, the client can The latest value of the data item is always read in the same session.

4. Monotonic write consistency

Monotonic read consistency means that if a node reads a certain value of a data item from the system, then the system should not return an older value for any subsequent data access of the node

5. Monotonic write consistency

Refers to a system that can ensure that write operations from the same node are executed sequentially

The above five are more common variants of eventual consistency. In actual system practice, several of these variants can be combined to build a distributed system with eventual consistency characteristics.

In general, the BASE theory is a large-scale, highly available and scalable distributed system, which is opposite to the ACID of traditional transactions. It is completely different from the strong consistency model of ACID, but obtains availability by sacrificing strong consistency. , And allow the data to be inconsistent for a period of time, but ultimately to be consistent.

 

For consistency and usability, we must objectively analyze these issues. The machine level and the system level should objectively look at these issues. There should be no fixed thinking, because different levels of understanding must be different. .

Guess you like

Origin blog.csdn.net/crossroads10/article/details/108167499