Architectural Thinking Growth Series Tutorials (2) - Application of CAP Theory in Large Internet Systems

background

In the computer field, it’s fine if you’re new to the industry. If you’re an experienced coder who doesn’t understand the CAP theorem, it’s really unreasonable. CAP is a basic principle that every technical architect must master.

content

CAP theorem

Now, as long as the Internet projects are slightly larger, they adopt a distributed structure. A system may consist of multiple nodes, and each node may need to maintain a copy of data.

Then, how to maintain the state between each node and how to ensure the synchronization of data between each node is something that everyone needs to pay attention to urgently.

The CAP theorem is the most fundamental principle in distributed systems. Therefore, understanding and mastering CAP is crucial to the design of system architecture.

CAP theorem (CAP theorem), also known as Brewer's theorem, was proposed by Eric Brewer, a computer scientist at the University of California, Berkeley, at the ACM PODC in 2000. a guess. In 2002, Seth Gilbert and Nancy Lynch of the Massachusetts Institute of Technology published a proof of Brewer's conjecture, making it a recognized theorem in the field of distributed computing.

What is the CAP theorem?

It refers to the fact that in a distributed system (referring to a collection of interconnected and shared data nodes), when it comes to read and write operations, only consistency (Consistence), availability (Availability), and partition tolerance (Partition Tolerance) can be guaranteed. Of the three, the other must be sacrificed.

CAP theory

The meanings of CAP respectively:

  • Consistency, for a specified client, the read operation is guaranteed to return the latest write operation result. Consistency can be divided into: weak consistency, strong consistency, and eventual consistency. If you are interested, you can view related documents online.
  • Availability (Availability), non-faulty nodes return reasonable responses (not error and timeout responses) within a reasonable time. Availability mode can be divided into: working to standby switching (Active-passive), double working switching (Active-active).
  • Partition Tolerance (Partition Tolerance), when a network partition occurs, the system can continue to perform its duties.

Application of CAP theorem in large-scale distributed Internet systems:

Although the theoretical definition of CAP is that only two of the three elements can be selected, if we think about it in a distributed environment, we will find that we must choose the P (partition fault tolerance) element, because the network itself cannot be 100% reliable and may fail , so partitioning is an inevitable option.

If we choose CA and give up P, then when the partition phenomenon occurs, in order to ensure C, the system needs to prohibit writing. When there is a write request, the system returns error, which conflicts with A again, because A requires returning no error and no timeout. Therefore, it is theoretically impossible for a distributed system to choose a CA architecture, but only a CP or AP architecture.

Consistency/Partition Tolerance

As shown in the figure above, in order to ensure consistency, when the partition phenomenon occurs, the data on the N1 node has been updated to y, but due to the interruption of the replication channel between N1 and N2, the data y cannot be synchronized to N2, and the data on the N2 node Still x.

At this time, when client C accesses N2, N2 needs to return Error, prompting client C: "The system has an error now." This processing method violates the requirement of availability (Availability), so the three CAPs can only satisfy CP.

Availability/Partition Tolerance

As shown in the figure above, in order to ensure availability, when a partition occurs, the data on the N1 node has been updated to y, but due to the interruption of the replication channel between N1 and N2, the data y cannot be synchronized to N2, and the data on the N2 node is still x.

At this time, when client C visits N2, N2 returns the current data x to client C, but in fact the latest data is already y, which does not meet the consistency (Consistency) requirements, so CAP The three can only satisfy AP.

Note: Although the x returned by the N2 node here is not a "correct" result, it is a "reasonable" result, because x is old data, not a messy value, but not the latest data.

What are the precautions for the application of the CAP theorem?

After understanding the CAP theorem, for developers, when we build services, we need to make trade-offs based on business characteristics, which points the current system can choose, and which points should be guaranteed.

The granularity that CAP focuses on is the data, not the entire system.

The trade-off between C and A can occur repeatedly in the same system at a very fine granularity, and each decision may be different because of specific operations, or even because specific data or users are involved.

Taking a merchant management system as an example, the merchant management system includes merchant account data (merchant ID, password), merchant information data (industry category, company size, revenue scale, etc.). Usually, merchant account data will choose CP, while merchant information data will choose AP. If the entire system is limited to CP, it does not meet the application scenario of user information. If the entire system is limited to AP, it does not meet the application scenario of merchant account data.

Therefore, when CAP theory is put into practice, we need to classify the data in the system according to different application scenarios and requirements, and choose different strategies (CP or AP) for each type of data, instead of directly restricting all data in the entire system to be the same strategy .

CAP ignores network latency.

This is a very implicit assumption that Brewer does not take latency into account when defining consistency. That is, when a transaction is committed, the data can be instantly copied to all nodes. But in reality, especially under the Internet architecture, it always takes a certain amount of time to copy data from node A to node B. If it is in the same computer room, it may be a few milliseconds, and if it is across computer rooms, it may be tens of milliseconds. That is to say, C in CAP theory cannot be perfectly realized in practice. During the process of data replication, the data of node A and node B are not consistent.

Under normal operating conditions, there is no choice between CP and AP, and CA can be satisfied at the same time.

The CAP theory tells us that a distributed system can only choose CP or AP, but in fact the premise here is that the system has a "partition" phenomenon. If there is no partition phenomenon in the system, that is to say, when P does not exist (the network connection of the node is normal), we do not need to give up C or A. Both C and A should be guaranteed, which requires that when designing the architecture, we must Consider whether to choose CP or AP when partition occurs, and how to ensure CA when partition does not occur.

Here we also take the user management system as an example. Even if CA is implemented, different data implementation methods may be different: user account data can be implemented in the form of "message queue", because message queue can better control real-time performance. , but it is more complicated to implement. The user information data can use the "database synchronization" method to implement CA, because the database method may have a high delay in some scenarios, but it is easy to use.

Giving up does not mean doing nothing, but you need to prepare for the recovery of the partition.

The CAP theory tells us that we can only take two of the three, and we need to "sacrifice" the other one. The "sacrifice" here is somewhat misleading, because "sacrifice" makes many people understand it as doing nothing. In fact, the "sacrifice" of the CAP theory just means that we cannot guarantee C or A during the partition process, but it does not mean that nothing is done. Abandoning C or A during the partition does not mean giving up C and A forever. We can perform some operations during the partition, so that the system can reach the state of CA again after the partition failure is resolved.

The most typical is to record some logs during partitioning. When the partition failure is resolved, the system will restore data according to the logs, so that it can reach the CA state again.

at last:

CAP is a very important architecture theory. If you are interested in becoming a friend of an architect, you must have a deeper understanding and understanding of theorems such as CAP, ACID, and BASE, and lay a solid foundation.

 

Previous Chapter Tutorial

Architectural Thinking Growth Series Tutorials (1) - The Implementation Method and Practice of Zhongtai Architecture

The series of tutorials

Architectural Thinking Growth Series Tutorials

my column

 

 

At this point, all the introductions are over

 

 

-------------------------------

-------------------------------

 

My CSDN homepage

About me (personal domain name, more information about me)

My open source project collection Github

 

I look forward to learning, growing and encouraging together with everyone , O(∩_∩)O Thank you

Welcome to exchange questions, you can add personal QQ 469580884,

Or, add my group number  751925591 to discuss communication issues together

Don't talk about falsehood, just be a doer

Talk is cheap,show me the code

Guess you like

Origin blog.csdn.net/hemin1003/article/details/114928496