An article to figure out what is the CAP theorem for distributed systems

An article to figure out what is the CAP theorem for distributed systems

Past memory big data Past memory big data The
original text of this article:
https://www.iteblog.com/archives/2390.html (click below to read the original text to enter)
This article is a reference to Gilbert and Lynch's specification and proof of the CAP Theorem article Summarized version. Most of the content refers to the An Illustrated Proof of the CAP Theorem article.

What is the CAP theorem

The CAP theorem is the basic theorem in distributed systems. This theory shows that any distributed system can satisfy at most two of the following three attributes.

  • Consistency
  • Availability (Availability)
  • Partition tolerance (Partition tolerance) The
    CAP theorem states that it is impossible for a distributed system to satisfy consistency, availability, and partition tolerance at the same time. It sounds simple, but what does consistency, availability, and partition tolerance mean?

In this article, we will introduce a simple distributed system and explain the meaning of the system's availability, consistency, and partition fault tolerance.

What is a distributed system

A distributed system is a system formed by a group of computers that are connected to each other through a network to transmit information and communication and coordinate their behavior. The components interact with each other to achieve a common goal. The engineering data that needs a large amount of calculation is divided into small pieces, calculated separately by multiple computers, and after the calculation results are uploaded, the results are unified and merged to arrive at the science of data conclusions.

Now let us consider a very simple distributed system. The system consists of two services, G1 and G2. Both services track the same variable V, which has an initial value of v0. G1 and G2 can communicate with each other and can communicate with external clients. The following figure is the architecture of our system:

An article to figure out what is the CAP theorem for distributed systems
The client can send read and write requests to any server. When a service receives a request, it will do any required calculations and then send a response to the client. For example, the following is an example of a write request:

An article to figure out what is the CAP theorem for distributed systems
The following is an example of a read request:

An article to figure out what is the CAP theorem for distributed systems
Now that we have the basic concepts of distributed systems, the next article will further introduce the availability, consistency, and partition fault tolerance of distributed systems.

consistency

Gilbert and Lynch’s description of consistency is: any read operation that begins after a write operation completes must return that value, or the result of a later write operation (Chinese meaning is that any read operation that begins after a write operation completes must return This value, or the result of a subsequent write operation). That is, in a consistent system, once a client writes a value to any server and gets a response, subsequent reading clients will read this value from any server in the distributed system. The following systems do not satisfy this feature:

An article to figure out what is the CAP theorem for distributed systems
The client updated v on the G1 server to v1, and the G1 server responded. But the result of the client getting the value of v from G2 is indeed v0. The following systems are consistent systems:

An article to figure out what is the CAP theorem for distributed systems
In this system, the G1 server copies the value of v to the G2 server before responding to the client. At this time, the client obtains the value of v from G2 and the result is v1.

Availability (Availability)

Gilbert and Lynch described availability as: every request received by a non-failing node in the system must result in a response (Chinese meaning: every request received by a non-failing node in the system must result in a response). That is to say, in the available system, the client sends a request to the server and the server does not crash, the server must finally respond to the client.

Partition Tolerance

Gilbert and Lynch described availability as: the network will be allowed to lose arbitrarily many messages sent from one node to another (Chinese meaning: allow the network to lose any number of messages sent from one node to another). This means that the communication messages between G1 and G2 can be discarded. If all messages between them are discarded, then our system looks like the following:

An article to figure out what is the CAP theorem for distributed systems
In a distributed environment, network partition is an inevitable fact. So our system must meet the partition fault tolerance, so that our system can operate normally.

CAP proof

At this point, we have understood the meaning of the availability, consistency, and partition fault tolerance of a distributed system. Now we will prove why a distributed system cannot satisfy these three at the same time.

We use the contradiction method to prove that, assuming that there is indeed a distributed system that satisfies these three conditions in reality, when the network between the systems is partitioned, it looks like the following situation:

An article to figure out what is the CAP theorem for distributed systems
Now the client C1 updates v on the G1 server to v1. Because our system is available, the G1 server will respond, but because the network is partitioned, G1 cannot copy the data to G2.

An article to figure out what is the CAP theorem for distributed systems
After writing the data, another client C2 sends a request to read v to the G2 server, but because of the existence of the network partition, v on the G2 server is still updated with the previous value, so the result obtained by the client C2 is v0.

An article to figure out what is the CAP theorem for distributed systems
In this case, C2 does not obtain the value written by C1, so the data consistency is not satisfied. It can be concluded that the distributed system cannot satisfy the availability, consistency, and partition fault tolerance at the same time.

CP or AP

First of all, since it is a distributed system, the network partition will definitely exist, so the distributed system must satisfy P, otherwise it is not a true distributed system. So we have to choose between A and C.

If the distributed system does not require strong availability, that is, to allow system downtime or unresponsive for a long time, we can consider abandoning A in this case. Our common Zookeeper is to satisfy CP.

If our system availability requirements are very high, then we can sacrifice consistency to meet it. The sacrifice of consistency here does not mean that the system has been in an inconsistent state. If this is the case, the system is useless. When we talk about sacrificing consistency, we generally mean sacrificing strong consistency while ensuring final consistency. In other words, the system is short-lived and inconsistent, and consistency can be guaranteed after a period of time, which is ultimately consistent.

Therefore, for a distributed system, P is a basic requirement. Among the three CAPs, the trade-off between C and A can only be made according to the system requirements, and we must try our best to improve P.

For final consistency, please refer to the BASE theory of this blog: https://www.iteblog.com/archives/2352.html .

Guess you like

Origin blog.51cto.com/15127589/2679561