Principle of CAP in Distributed System

Principle of Distributed System CAP

Although there are many developments in distributed systems, they are not perfect. The CAP principle is one of them.
CAP principle: Refers to in a distributed system, Consistency (consistency), Availability (availability), Partiton tolerance (partition tolerance), the three are not compatible.

Consistency (C) : Whether all data backups in a distributed system have the same value at the same time. Simply put , the data of all nodes at the same time is exactly the same, which means that the more nodes there are, the more time it takes to synchronize the data.
For example, in distributed data storage, multiple devices need to store the same data, so that once the data changes, other databases must synchronize data. This means that the more devices I have, the slower my synchronization rate will be. The more time it consumes.

Availability (A) : After the load is too heavy, the cluster as a whole can still respond to normal read and write requests from users. Simply put , no matter how the project drops, the response time of my system to user requests must be within an acceptable range, and the user experience cannot be without.

Partition tolerance (P) : Partition tolerance is high availability. The collapse of one node will not affect other nodes. To put it simply, it ’s okay for my server nodes to crash a few times, as long as I have a normal server, it means that the more nodes I have, the better.

CAP must not have both principles

CAP theory means that in a distributed system, only the above two points can be achieved at most. Let's analyze it.
First:
Why can't C be implemented when you have AP?
Insert picture description here
According to the above figure, let’s analyze, for example, I now have 10,000 nodes, and now I want to back up data. I need to back up the same data on 10,000 devices (p is satisfied) to maintain data consistency (C), we imagine The data synchronization of 10,000 devices must take time, so we can't realize A, so if we want to realize AP, it is impossible to guarantee data consistency.
Second,
why can't you achieve A if you have CP?
It is also easy to understand, 10,000 devices, I want to maintain data consistency, I can not guarantee that the request response time is very short, right?
Third,
why can't P be realized by owning CA?
In the same way, I want to ensure the consistency of the data, that is, the replication of the data, and also to ensure that within a reasonable request and the corresponding time, then my machine and equipment cannot be too many, so that P cannot be satisfied.

According to the current network hardware, there will definitely be delays in packets, setting individual servers to crash, at least to ensure the normal operation of the system, so **partition tolerance (P)** is indispensable.

Guess you like

Origin blog.csdn.net/qq_31142237/article/details/89790914