[Knowledge Popularization] CAP theorem and BASE theory that you have to understand in distributed systems

CAP theorem

CAP theorem is also called Brewer's theorem. This theorem tells us that in a distributed system, it is impossible to satisfy the following three points at the same time:

  • Consistency (equivalent to all nodes accessing the same copy of the latest data)

  • Availability (Availability) (A non-error response can be obtained for every request-but there is no guarantee that the data obtained is the latest data)

  • Partition tolerance (in terms of actual effect, partition is equivalent to the time limit of communication. If the system cannot achieve data consistency within the time limit, it means that a partition has occurred, and the current operation must be in C and Choose between A.)

This theorem holds that distributed systems can only satisfy the above two points at most.

image

origin

This theorem originated from a conjecture proposed by Eric Brewer, a computer scientist at the University of California, Berkeley, at the 2000 Principles of Distributed Computing (PODC) seminar. In 2002, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology (MIT) published a proof of Brewer’s conjecture, making it a theorem.

The CAP theorem proved by Gilbert and Lynch is somewhat narrower than Brewer's assumption. The theorem discusses the processing scheme when two conflicting requests arrive at two different distributed nodes that are not connected to each other.

——Source: Wikipedia

Partition tolerance

Partition tolerance is also called partition tolerance, so let's understand what partition is. Generally, distributed systems may deploy servers in different regions. For example, in the Taobao system, Subsystem 1 may be deployed in Guangzhou and Subsystem 2 may be deployed in Xinjiang. Then these two regions are two districts. There may be a communication problem between System 1 and System 2. So need to consider this situation when designing the system.

image

Consistency

Consistency is also called consistency, and everyone should understand this. For example, we often encounter these problems: how to ensure the consistency of cache and database data, how to ensure the consistency of middleware master node and slave node data, and so on.

For a specific example, a piece of data in the current system is A1. If the user changes this piece of data in the system to A2, then the next piece of data read by the user is A2. This ensures the consistency of data read and write.

But if this system is a master-slave system, system 1 is responsible for writing, and system 2 is responsible for reading. When the user writes A2 into the system 1, this is because the system 1 has not synchronized the updated data to the system 2 for some reason, and the user still reads the previous data A1. This leads to the problem of inconsistent data reading and writing.

Availability

Availability Chinese name availability, this is well understood. If you want to deploy Redis, if it is a stand-alone deployment, when the ES hangs down, the system will be unavailable. At this time, distributed cluster deployment is recommended. The deployment structure diagram is as follows:

As shown in the figure above, when the redis-master hangs up, one of slave1, slave2, and slave3 will be elected as the master to ensure the normal operation of the redis cluster. This process is imperceptible to the user. The cluster ensures the availability of the system.

About usability and consistency

We cannot guarantee the availability and consistency of the system at the same time. Here is a simple argument: if you want to ensure consistency, you need to lock the system when writing data, or use a stand-alone system. Then the user cannot write data again before the lock is released, for a stand-alone system. A system crash will also cause the system to be unavailable; if you want to ensure availability, you must use a cluster, which requires synchronization between systems, which may also lead to unsuccessful synchronization, resulting in inconsistency of system data.

So, in the actual scenario, we have to focus on, or ensure availability. Either ensure consistency.

Of course, ensuring system availability does not mean that there is no need for consistency. In the end, we still have to ensure the ultimate consistency of the system.

BASE theory

BASE theory mainly includes the following contents:

  • Basically Available

  • Soft state

  • Eventually consistent

In the CAP theorem, we know that 3 of them can only choose 2 of them. For the system, the partition fault tolerance must be satisfied. You cannot have communication problems frequently in the two partitions, which is catastrophic for the system.

Then we mainly have to choose between availability and consistency. It is the issue of AP and CP.

Basically Available

For basic availability, let’s take an example. In the spike system, in order to ensure that the system can withstand the pressure of concurrency, not all users can access the real server. Most visitors may be directed to other pages (or servers). ). And what may be displayed on the interface is: the product has been sold out, thank you for your patronage.

Here we at least ensure that the system is available. If all users can access the system, it will eventually cause the system to crash.

Soft state

In the soft state, we allow the system's data to have an intermediate state, and this state will not affect the overall availability.

For example, in Redis master-slave synchronization, if the synchronization is not timely, the master data and the slave data may be inconsistent. If our business allows inconsistency, then this inconsistent state is a soft state.

In other words, the soft state allows a delay in the data synchronization of each node.

Eventually consistent

Regarding eventual consistency, let's take a look at Wikipedia's explanation:

Eventual consistency (English: Eventual consistency) is a memory consistency model in distributed computing. It refers to the reading of data that has been changed and written, and the updated data can eventually be obtained, but it is not completely guaranteed to be obtained immediately The updated data. This model can usually achieve higher availability. Eventual consistency is achieved through optimistic replication, or lazy replication. This concept was originally developed in mobile applications, and has been widely used in various distributed systems. A distributed system that achieves final consistency is said to have reached a "converged" state.

In other words, the system can guarantee that the data will eventually reach a consistent state without other new update operations, so all clients' data access to the system can eventually obtain the latest value.

BASE theory tells us that when our system cannot achieve strong consistency, we can use appropriate methods to make the system achieve ultimate consistency according to our own business characteristics. Therefore, BASE theory is oriented to large-scale, highly available and scalable distributed systems.

reference

1. CAP theorem-Wikipedia, the free encyclopedia (wikipedia.org)-https://zh.wikipedia.org/wiki/CAP theorem

Recommended in the past

Scan the QR code to get more exciting. Or search Lvshen_9 on WeChat , you can reply to get information in the background

Reply "java" to get java e-book;

Reply "python" to get python e-book;

Reply "Algorithm" to get the algorithm e-book;

Reply to "big data" to get big data e-books;

Reply to "spring" to get the SpringBoot learning video.

Reply to "Interview" to obtain interview materials from first-line manufacturers

Reply to "The Road to Advancement" to get a mind map of the Road to Advancement in Java

Reply to "Manual" to get Alibaba Java Development Manual (Songshan Ultimate Edition)

Reply "Summary" to get the PDF version of the Java back-end interview experience summary

Reply to "Redis" to get the Redis command manual, and Redis special interview questions (PDF)

Reply to "Concurrent Map" to get Java Concurrent Programming Mind Map (xmind Ultimate Edition)
 

Another: Click [ My Benefits ] to have more surprises.

Guess you like

Origin blog.csdn.net/wujialv/article/details/115324799