Take you to understand the data consistency problem of distributed systems

Lao Liu is a second-year graduate student who is about to find a job. On the one hand, writing a blog is to review and summarize the knowledge points of big data development, on the other hand, he hopes to help partners who teach themselves programming like himself. Since Lao Liu is self-taught in big data development, there will definitely be some shortcomings in the blog, and I hope everyone can criticize and correct, let us make progress together!
Today, I will talk to you about the data consistency of distributed systems. This must start from the development process of server architecture deployment! The article is long, please be patient and watch it, don’t miss it!

1. Background

1.1. Centralized Service

The first thing to talk about is centralized service. What is centralized service? That is, everything is done by one server.

The centralized system is composed of one or more main computers to form a central node. The data is stored in this central node, and all the services of the entire system are on this central node, and all the functions of the system are done by it.

That is to say, in a centralized system, each client is only responsible for the input and output of data, and the data storage and control processing are completely left to the host to complete.

Insert picture description here
The advantages of centralized services:

Simple structure, simple
deployment, simple
project structure

But its shortcomings are also very obvious:

The cost of R&D and maintenance of
mainframes is very high. Mainframes are very expensive.
There is a single point of failure. The mainframe hangs and all services are terminated
. The performance expansion of mainframes is limited by Moore's Law

What is Moore's Law?

Moore's Law was proposed by Gordon Moore, one of the founders of Intel. The content is: when the price remains unchanged, the number of components that can be accommodated on an integrated circuit will double about every 18-24 months, and the performance will also double. In other words, the computer performance that can be bought for every dollar will more than double every 18-24 months. From: Baidu Encyclopedia

Moore's Law tells us: Vertical expansion is theoretically limited, so only horizontal expansion can be considered, and theoretically, horizontal expansion is theoretically unlimited!

Since vertical expansion is limited, let's try horizontal expansion, and there will be distributed!

1.2. Distributed Services

Distributed means that more ordinary computers (compared to expensive mainframes) can be used to form a distributed cluster to provide external services. The more computers there are, the more CPU, memory, storage resources, etc., the greater the amount of concurrent access that can be processed.

For example, an electronic shopping mall implemented by a distributed system may be functionally split into multiple applications to provide different functions to form a distributed system to provide external services.

Therefore, the computers in a distributed system are almost unlimited in space. These computers may be placed in different cabinets, or they may be deployed in different computer rooms, or in different cities.

Insert picture description here
Compared with centralized systems, distributed systems have higher cost performance, stronger processing capabilities, higher reliability, and good scalability.

However, while the distributed solution solves the high concurrency problem of the website, it also brings some other problems.

First of all, the necessary condition for distribution is the network, which may have a certain impact on performance and even service capabilities. Second, the more servers in a cluster, the greater the probability of server downtime. In addition, because the service is distributed in the cluster, the user's request will only fall on one of the machines, so once it is not handled well, it is easy to cause data consistency problems.

1.3. Distributed anomalies

1. Abnormal communication: the unavailability of the network (message delay or loss) will result in the inability of network communication within the distributed system, which may cause data loss and inconsistency of multiple nodes, and may cause data disorder.

2. Network partition: The network is not connected, but the internal network of each sub-network is normal, which causes the network environment of the entire system to be divided into several isolated areas, and the distributed system has data inconsistencies caused by local small clusters.

3. Node failure: the phenomenon of downtime of the server node.

4. Loss of stored data: For stateful nodes, data loss means loss of state. Usually, the stored state can only be read and restored from other nodes. Solution: Use multiple copies mechanism.

1.4. Measuring performance indicators of distributed systems

1. Performance: This is a very troublesome problem. It is often difficult to achieve low latency for systems that pursue high throughput; it is also difficult to improve QPS when the average response time of the system is long.

System throughput, the system refers to the total amount of data can be processed at a time, the system typically can handle per second to measure the total amount of data;
the system response delay time refers to the complete system required to use a certain function;
System Concurrency refers to the ability of a system to complete a certain function at the same time, and is usually measured by QPS.

2. Availability: system availability (availability) refers to the system's ability to correctly provide services in the face of various abnormalities. Availability is an important indicator of distribution, which measures the robustness of the system and reflects the fault tolerance of the system.

3. Scalability: The scalability of the system refers to the characteristics of a distributed system that improves system performance (throughput, delay, concurrency), storage capacity, and computing power by expanding the scale of cluster machines.

4. Consistency: In order to improve availability, distributed systems always inevitably use the copy mechanism, which causes the problem of copy consistency.

For example, a piece of data is stored in a distributed system, and there are multiple different nodes that store the same data. If the data stored in multiple different nodes is not the same, this will happen when multiple clients visit. The result of the first client's visit is A, and the result of the second client's visit is B. Two clients access to different results, that is, the consistency is not well done.

Having said that, if we design an excellent distributed system, it should have these characteristics: high throughput, low response delay, strong concurrency, high availability, strong scalability, and good consistency. But not every feature can be satisfied. Several features are contradictory and we need to find ways to overcome them!

What is really complicated in a distributed scenario is the issue of data consistency!

1.5. Consistency understanding

There are also many types of consistency. Here are the three that Liu knows.

Strong consistency: After the write operation is completed, the read operation must be able to read the latest data. In layman's terms, as long as the client writes the results, the latest data can be obtained whenever it is accessed. However, it is difficult to implement in distributed scenarios. The subsequent Paxos algorithm, Quorum mechanism, ZAB protocol, etc. can be implemented!

Weak consistency: It is not guaranteed to get the latest data, and it is possible to get the old data.

Final consistency: Regardless of any state in the middle, it only guarantees that the data in the final system is correct after a period of time. It is also the most widely used consistency model in high concurrency scenarios.

1.6. The role of distributed consistency

Having said so much about distributed consistency, what is its role?

1. In order to improve the availability of the system, the multi-copy mechanism is generally used. Multiple copies will have the problem of distributed consistency. It is to improve the availability of the system and prevent the unavailability of the system caused by a single point of node failure.

2. Improve the overall performance of the system. Data is distributed on multiple nodes in the cluster, and they can all provide services to users.

Lao Liu has said so much, have you guessed what you want to introduce?

The above-mentioned content is only to lead to the problem of data consistency in distributed systems! The solutions we use to solve the data consistency problem of distributed systems are as follows:

Distributed transaction + transaction
Distributed consensus algorithm
Quorum mechanism
CAP and BASE theory

2. Distributed transactions

In a distributed system, each node can know whether its transaction operation is successful, but it cannot know whether the transaction of other nodes in the system is successful. This may cause inconsistencies in the state of the nodes in the distributed system. Therefore, when a transaction needs to span server nodes and the ACID characteristics of the transaction must be guaranteed, a coordinator role must be introduced. Then the other nodes that perform transaction operations are called participants.

There are two typical submission modes of distributed transactions in real life: 2PC and 3PC.

2.1. 2PC submission process

Directly above:

Insert picture description here
I let A do one thing, let B do another thing, and the two things must succeed or fail simultaneously in a distributed transaction. How to achieve data consistency?

2PC is divided into two stages:

The first stage: execute the transaction, but do not commit.

The second stage: When the coordinator receives positive feedback from all transaction participants in the first stage (the transaction is executed successfully),

Just send an order for all participants to commit the transaction.

2.2. 2PC problem

Looking at the two submission stages and diagrams of 2PC, experienced people can see the problems inside at a glance.

1 Blocking problem

The coordinator sends commands to the participants. Since the commands are sent through the network, there will be a sequence and delay in receiving the commands from different participants.

For example, Participant A received it soon, and Participant B had a problem with the network, and it took a long time to receive the order. Participant A quickly processed the sending feedback, while participant B sent feedback a long time later, resulting in a particularly long wait time for the coordinator.

This is a very typical blocking problem, which wastes resources and affects performance!

2 There is no fault tolerance mechanism, there is a single point of failure problem

The transaction coordinator is the core of the entire distributed transaction. Once the coordinator fails, look at the above picture, you will know that the participant will not receive the commit/rollback notification, which causes the participant node to be in a transaction that cannot be completed. Intermediate state.

3 Inconsistent data

In the second stage, if a local network problem occurs, one participant receives the submitted command and the other participant does not receive the submitted command, which will cause data inconsistency between nodes.

2.3. 3PC

3PC means three-phase commit. It is an improved version of two-phase commit. It divides the "submit transaction request" of the two-phase commit protocol into two, forming three phases: cancommit, precommit, and docommit.

In addition to adding the CanCommit stage on the basis of 2PC, a timeout mechanism is also introduced. Once the transaction participant does not receive the commit/rollback instruction from the coordinator within a specified time, it will automatically commit locally, which can solve the single point of failure of the coordinator.

2.4. Execution process analysis

The first stage: CanCommit stage

When preparing for the first stage, first ask each participant whether they can perform transaction operations and the timeout mechanism. Participants will automatically submit if they do not receive the instruction from the coordinator for a certain period of time.

The second stage: PreCommit stage

1. If each participant returns consent, the coordinator will send a pre-submission request to all participants and enter the pre-submission stage;

2. After the participant receives the pre-commit request, execute the transaction operation.

3. After the participant has performed the local transaction, it will send an Ack to the coordinator to indicate that it is ready to submit, and wait for the coordinator's next instruction.

4. If the coordinator receives the pre-commit response as rejection or timeout, it executes the interrupt transaction operation and informs each participant to interrupt the transaction.

5. Participants will actively interrupt the transaction/submit directly after receiving the interrupted transaction or waiting for timeout

The third stage: doCommit stage

1. The coordinator receives all participating Ack, then enters the submission section from pre-submission, and sends a submission request to each participant.

2. The participant receives the submission request, formally submits the transaction (commit), and feeds back the submission result Y/N to the coordinator.

3. The coordinator receives all feedback messages and completes the distributed transaction.

4. If the coordinator does not receive any feedback during the timeout period, an interrupt transaction instruction is sent.

5. After the participants receive the instruction to interrupt the transaction, they use the transaction log to rollback.

6. The participant feedbacks the rollback result, and the coordinator receives the feedback result or overtime, and completes the interrupted transaction.

2.5. 3PC problems

3PC may also have data inconsistencies. In the third stage, all participants are allowed to roll back the transaction, but if one participant does not receive it within the specified time, it will commit by default, and data inconsistency will occur. Due to network problems, data inconsistencies are particularly prone to occur between the second stage and the third stage.

3. Distributed consensus algorithm

Based on the principles of 2PC and 3PC, excellent developers have implemented distributed consensus algorithms. Here, Lao Liu will briefly talk about the concepts related to Poxos algorithm and ZAB protocol. If you want to learn more about the Paxos algorithm and the ZAB protocol, after Liu Liu finishes looking for a job, write an article on Zookeeper source code.

3.1. Paxos algorithm

The Paxos algorithm uses a Greek story to describe. In Paxos, there are three roles, namely

1. Proposer (proposer, used to issue a proposal proposal),

2. Acceptor (the acceptor can accept or reject the proposal),

3. Learner (learner, study the selected proposal, when the proposal is accepted by more than half of the Acceptors, it is approved).

Map to the zookeeper cluster:

leader: the chairman of the proposal (the solution to a single point of failure is the leader election mechanism)

follower: Delegate to the People's Congress participating in voting

observer: passively accept everyone in the country

And there is a particularly well-known mechanism: parliamentary system

An agreement that ensures that more than half of them reach agreement

To summarize the Paxos algorithm, it is that all transaction requests must be coordinated and processed by a globally unique server. Such a server is called a leader server, and the remaining other servers become follower servers.

The leader server is responsible for converting a client transaction request into a transaction proposal and distributing the proposal to all follower servers in the cluster. After that, the leader server needs to wait for the feedback from all the follower servers. Once more than half of the follower servers have given the correct feedback, the leader will once again distribute the commit message to all the follower servers, requesting it to submit the previous proposal.

3.2. ZAB Agreement

The underlying working mechanism of ZooKeeper is realized by ZAB. It implements two main functions: crash reply and message broadcast.

The two important features of the ZAB protocol to ensure data consistency are:

1. The ZAB protocol needs to ensure that the transactions that have been committed on the leader server are finally committed by all servers.

2. The ZAB protocol needs to ensure that transactions that are only proposed on the leader server are discarded.

In order to solve the single point of failure, there is a leader election algorithm. In the leader election, if the leader election algorithm can ensure that the newly elected leader server has the transaction proposal with the highest transaction number (ZXID) of all machines in the cluster, then it can be guaranteed that the newly elected leader must have all the submitted proposals .

Because each execution of a transaction will have a number, the highest transaction number represents the latest transaction, that is, the latest data. According to the above-mentioned ZAB agreement, ZooKeeper realizes the consistency of distributed system data!

4. Pigeon Nest Principle

Brief description: If there are n cages and n+1 pigeons, all pigeons are kept in the pigeon cage, then at least one cage contains at least 2 pigeons.

Insert picture description here

5. Quorum NWR mechanism

Quorum NWR: Quorum mechanism is commonly used in distributed scenarios to ensure data security and achieve final consistency voting algorithms in a distributed environment. The main principle of this algorithm comes from the pigeon nest principle. Its biggest advantage is that it can not only achieve strong consistency, but also customize the consistency level!

N: total number of nodes

W: The total number of successful writes

R: total reads

When W+R>N, the latest data is guaranteed to be read, that is, strong consistency! why would you say so?

Insert picture description here
As shown in the picture above, there are 4 boxes and 3 boxes have things in them. How can I ensure that I can get the boxes with data? At least two boxes can be used to get the box with things!

Using this principle, as long as it is guaranteed (W + R> N), the latest data can be read, and the data consistency level can achieve strong consistency based on the constraint of the number of read and write copies!

Now the discussion is divided into the following three situations: The premise is that N has been determined not to change!

W = 1, R = N,Write Once Read All

In a distributed environment, writing a copy is equivalent to only one box with something. If you want to read the latest data, that is, to get the box with something, you must read all nodes and then take the value of the latest version Up. The write operation is efficient, but the read operation is inefficient. The consistency is high, but the partition fault tolerance is poor and the availability is low.

W = N,R = 1, Read Only Write All

In a distributed environment, all nodes can be read after synchronization, so as long as any node is read, the latest data can be read. Read operations are efficient, but write operations are inefficient. The partition has good fault tolerance, poor consistency, higher implementation difficulty, and high availability.

W = Q, R = Q where Q = N/2 + 1

It can be simply understood as writing more than half of the nodes, then reading more than half of the nodes, achieving a balance of read and write performance. It is suitable for general applications, with a balance between read and write performance. For example, N=3, W=2, R=2, partition fault tolerance, availability, and consistency are balanced.

That's what ZooKeeper does! The third case is used!

6. CAP theory

According to the above, it is difficult to achieve high availability if strong consistency is achieved. The two are very contradictory. Therefore, the CAP theory tells us that a distributed system cannot satisfy the three requirements of C, A, and P at the same time.

C: Consistency, strong consistency

Keep multiple copies of data consistent in a distributed environment

A: Availability, high availability

The service provided by the system must always be available, and every operation request from the user can always return the result within a limited time

P: Partiton Tolerance partition fault tolerance

When a distributed system encounters any network partition failure, it still needs to be able to provide external services that meet consistency and availability

Since a distributed system cannot meet the three requirements of C, A and P at the same time, how to choose?

CAP can only choose 2 from 3, because in a distributed system, fault tolerance P is definitely a must, so there are only two situations at this time. Network problems lead to either error return or blocking waiting. The former sacrifices consistency and the latter At the expense of usability.

For stand-alone software, because P is considered differently, it must be CA type, such as MySQL.

For distributed software, because P must be considered, so when A and C cannot be taken into account, only A and C can be weighed, such as HBase, Redis, etc. The service is basically available and the data is eventually consistent. Therefore, the BASE theory was born.

7. BASE Theory

In most cases, we actually do not necessarily require strong consistency. Some businesses can tolerate a certain degree of delay consistency. Therefore, in order to take into account efficiency, the final consistency theory BASE has been developed. Its core idea is: even if strong consistency cannot be achieved However, each application can adopt appropriate methods to make the system achieve ultimate consistency according to its own business characteristics.

In a word, don't go to extremes in doing things. BASE is the result of weighing C and A in CAP theory.

BASE theory does not achieve strong consistency, but final consistency; it is not high availability, but basic availability.

Basically Available: Basically available means that when a distributed system fails, part of the availability is allowed to be lost to ensure that the core is available. For example: Taobao Double 11, in order to protect the stability of the system, normal orders, other edge services may be temporarily unavailable.

Eventually Consistent: Eventually consistent means that all data copies in the system can finally reach a consistent state after a certain period of time.

In the future, if you develop a distributed system, you can decide whether to pursue high availability or strong consistency based on your business!

8. Summary

Well, the data consistency problem of distributed system is about to talk about, and Lao Liu mainly talked about the background and implementation of distributed system consistency. Although the current level may not be as good as you guys, Lao Liu still hopes to become better and to help more self-taught partners.

If you have any related questions, please contact the official account: Lao Liu who is working hard, and have a pleasant exchange with Lao Liu. If you feel that it has helped you, you may wish to like it and follow it!
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_36780184/article/details/112440446