Distributed system-CAP theory and BASE theory

Before introducing the consistency protocol, we can first understand the distributed system. It turns out that when we were in school, the practice projects were definitely centralized deployments. For example, a Tomcat was solved, including many small projects now, as follows:
Insert picture description here

However, with the provision of service performance requirements, or in order to avoid problems such as single points of failure, centralized deployment may not be able to meet our needs.A system in which hardware or software components are distributed on different network computers, and they communicate and coordinate with each other only through message passing, This is a distributed system.
Insert picture description here
Its characteristics are as follows:

  • Distribution
  • Equivalence
  • Concurrency
  • Lack of global clock
  • Failure occurs at any time

Distribution

Since it is a distributed system, the most obvious feature must be the distribution. From a simple point of view, assuming that a project is relatively large, we can divide the entire project into different functions and different professional services. Product microservices, order microservices, these services are deployed in different Tomcats, different servers, or even different clusters, the entire architecture is distributed in different places, is random in space, and will increase at any time , Delete the server node, etc.

Equivalence

Equivalence is a goal of distributed design. To complete a distributed system architecture, it is definitely not simply to split a large single system into microservices and then deploy it on different server clusters. Each microservice completed after the split may find problems, which may cause problems for the entire project.

For example, order service, in order to prevent order service problems, generally need to have a backup, can replace the original order service when there is a problem with the order service, which is what we said above, in order to avoid single points of failure.

This requires that the two (or more than two) order services are completely equal and the functions are completely consistent. In fact, this is a kind of redundancy of service copies.
The other is the redundancy of data copies, such as databases, caches, etc., which are the same as the order service mentioned above. For security reasons, there must be exactly the same backup, which is the meaning of equivalence.

Concurrency

Concurrency is actually not new to us. When learning concurrent programming, multithreading is the foundation of concurrency. But now we are not in contact with the multi-threaded perspective, because when we introduce concurrent coding, we are all on the same machine and the same JVM perspective, here we may from the perspective of multiple JVM, for example, in a distributed Multiple nodes in the system may concurrently operate some shared resources and other issues, which involves the problem of distributed locks.

Lack of global clock

In a distributed system, the nodes may be placed in any position, and each location, each node has its own time system, so in a distributed system, it is difficult to define the two transactions entangled first, and the reason It is because of the lack of a global clock sequence to control, of course, we can solve the problem by calling the time server.

Failures can happen at any time

Any node may experience power outages, crashes, etc. The more server clusters, the greater the possibility of failure. As the number of clusters increases, failures will even become a normal state.




We briefly introduced the characteristics of the distributed system above, so what simple problems will our actual distributed system bring? 通信异常Are: 网络分区, 三态, ,节点故障

Abnormal communication

The communication anomaly is actually a network anomaly, and the network system itself is unreliable. Since the distributed system needs to transmit data through the network, the network fiber, router and other hardware will inevitably have problems. As long as there is a problem with the network, it will affect the sending and receiving process of the message, so the loss or extension of data messages will become very common.

Network partition

Network partitioning is actually a split-brain phenomenon. There was a leader to manage the coordination of the entire system. Everything was in order. Suddenly there was a network problem. Some nodes could not receive the leader ’s instructions. Maybe in this case, A new leader will appear to coordinate the system. But the original Leader is still there, and there is no crash, but the network system is temporarily interrupted. At this time, there will be problems. In the same system, different Leaders are coordinating, which will inevitably cause chaos in this system.

This kind of schizophrenia occurs when there are two conflicting responsible persons in the same area (distributed cluster) due to various problems. Here it is called brain splitting, also called network partitioning.

Tristate

What are three states? The three states are actually success, and the third state other than failure is called 超时态. In a JVM, the application will get a clear response after calling a method function, either success or failure, and in a distributed system, although the response to success or failure can be accepted in most cases, but once the network If an exception occurs, it is very likely that a timeout will occur. When such a timeout occurs, the initiator of the network communication cannot determine whether the request was successfully processed.

Node failure

This is actually introduced in the features. Node failure is a relatively common problem in distributed systems. It refers to the phenomenon of downtime or "zombie" that occurs in the nodes that form the server cluster. This phenomenon often occurs.




CAP theory

Ok, after understanding the above problems, how do we solve them? First of all, we may generally solve the above problems from three angles, which is ours CAP理论. CAP is actually Consistency, Availability, Partition tolerance

consistency

Consistency is a characteristic of transaction ACID [Atomicity, Consistency, Isolation, Durability], which was introduced in detail when introducing MySQL, and the consistency here is basically Similar, but now the distributed environment is considered, then it may not be a single database.

In a distributed system, consistency is whether data can be guaranteed to be consistent among multiple copies. The consistency here is similar to the equivalence mentioned earlier. If changes to a certain data item can be successfully executed in a distributed system, all users can immediately read the latest value, then such a system is considered to have strong consistency

Usability

Availability means that the system provides services that must always be available, and the user's operation request can always access the results within a limited time . In order to achieve a limited time, you may need to use the cache and may need to use the load. At this time, the node added by the server is for performance considerations; in order to return the results, you need to consider the server master and backup. Can be replaced as soon as possible to avoid problems such as single points of failure.

Partition fault tolerance

When a distributed system encounters any network partition failure, it still needs to be able to provide services that meet consistency and availability, unless the entire network environment has failed.




BASE theory

Then can we meet the three requirements of the above CAP theory in a distributed system at the same time, sorry, it is impossible, at most only two of them can be met, so we must have some choices, as follows:

Trade-off description
Give up P
(meet AC)
Put data and services on one node to avoid the negative impact caused by the network, fully guarantee the availability and consistency of
the system but give up P means to give up the scalability of the system
Abandon A
(Meet PC)
When the node fails or the network fails, the affected service needs to wait for a certain period of time.
Therefore, during the waiting time, the system cannot provide normal services to the outside, so it is not available.
Give up C
(meet PA)
The system cannot guarantee the real-time consistency of the data, but promises that the data will eventually guarantee the consistency.
Therefore, there is a window period of data inconsistency, and the length of the window period depends on the design of the system

But we found that since we are a distributed system, if we give up P, then we are not back to a single node, so we are thinking about giving up A or C, but we can not give up completely, so we can only according to the business Needs, as much as possible to take certain trade-offs, so came our BASE theory


Basically Avaliable

When an unforeseen failure occurs in a distributed system, partial availability is allowed to be lost, ensuring the "basic availability" of the system; reflected in "loss in time" and "loss in function". For example, some users may freeze or downgrade their Taobao page during the double eleven peak period.

Soft state

The data in the system is allowed to exist in an intermediate state, that is, there is a delay in the data synchronization process between the data copies of different nodes of the system, and it is believed that this delay will not affect the system availability. For example: during the Spring Festival 12306 website ticketing, the request may enter the queue.

Eventually consistent

After a period of data synchronization, all data can eventually reach a consistent state. For example, the transfer amount on the bank APP is temporarily inconsistent

286 original articles published · Liked12 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/newbie0107/article/details/104976355