Detailed explanation of the underlying principles of Zookeeper

Many students are giving feedback, saying that zk is difficult to learn and they don’t understand very well. Here, I will take you to explain Zookeeper in detail.

First of all, what is zk? First of all, it must be a distributed service framework. It is a sub-project of Apache Hadoop. It is mainly used to solve some data management problems often encountered in distributed applications, such as: unified naming service, cluster management , Management of distributed application configuration items, etc.

Second: Zookeeper is a database 

Third: Zookeeper is a database with a system characteristic

Fourth: Zookeeper is a distributed database that solves the problem of data consistency 

Fifth: Zookeeper is a distributed database (watch) with publish and subscribe functions

Say that the students should all agree with it, right?

So what is this consistency?

Consistency is divided into strong consistency, weak consistency, and final consistency

Some students don’t understand well, then read the following

It is mandatory that when step 2 is read, the value that must be read is 2, and the one that cannot be read is 1, then the synchronization between the databases is required to be abnormally fast or the lock on step 2 is required to wait for the completion of the data synchronization, then this is called Strong consistency

When step 2 is allowed to read, what can be read is 1, then this is called weak consistency, in fact, it does not need to be consistent;

When step 2 is allowed to read, you can read 1 first, and then read 2 after a period of time, then this is called final consistency, which means you can wait for a period of time to be consistent;

A cluster needs to provide strong external consistency, so as long as the data of a server in the cluster changes, it needs to wait for the data synchronization of other servers in the cluster to complete the normal external service.

Ensures strong consistency, usually requires loss of availability

CAP can also be divided into three,

Consistency: Consistency (strong consistency)

 Availability: Availability

 Partition Tolerance: Partition fault tolerance

Partition fault tolerance means that when a distributed system encounters a node or network partition failure, it can still provide external services that satisfy consistency and availability.

Partition tolerance and scalability are closely related. In distributed applications, the system may not function normally due to some distributed reasons. Good partition fault tolerance requires adaptability. Although it is a distributed system, it seems to be in a functioning whole. For example, in the current distributed system, one or several machines are down, and the other remaining machines can still operate normally to meet system requirements, or there is a network abnormality between the machines, which separates the distributed system into several independent Part, each part can also maintain the operation of the distributed system, so that it has good partition fault tolerance.

To put it simply, in the case of network interruption and message loss, if the system can still work normally, it has better partition fault tolerance. "If there is anything special about Spanner, it is Google’s wide area network. Google established a private network and a strong network

Engineering can be used to ensure P. Based on years of operational improvement, partitioning can be minimized in the production environment to achieve high availability. "

                                                                                                                                             -The husband of CAP has come to the question, can CAP meet at the same time?

If the CAP is satisfied at the same time, it means that when a problem occurs in the internal network of a distributed system, the distributed system can also ensure that the system is available and the data is consistent.

So far, due to network problems, CAP cannot be satisfied at the same time

CP WITHOUT A is almost non-existent in a distributed system. First of all, in a distributed environment, network partitioning is a natural problem. Because partitioning is inevitable, if you abandon P, it means abandon the distributed system. If a distributed system does not require strong availability, that is, tolerates system downtime or unresponsiveness for a long time, then CP can be guaranteed in CAP and A can be discarded.

A distributed system that guarantees CP but discards A. Once a network failure or message loss occurs, the user's experience must be sacrificed and the user will be allowed to access the system after all the data is consistent.

There are actually many systems designed as CP, the most typical of which is many distributed databases, which are all designed as CP. When extreme situations occur, the priority is to ensure strong data consistency at the cost of discarding the availability of the system. Such as Redis, HBase, etc., as well as Zookeeper commonly used in distributed systems, also choose to give priority to CP among the three CAPs. To be highly available and allow partitioning, you need to give up consistency. Once network problems occur, the nodes may lose contact. In order to ensure high availability, it needs to be returned immediately when the user accesses it. Each node can only use local data to provide services, and this will lead to inconsistencies in global data.

The selection according to the scene can be divided into: • Money security (CP) and user experience (AP) (reserving partition fault tolerance and availability, but not

Abandon consistency) BASE theory

Basically Available (Basically Available): Basically available means that when a distributed system fails, part of the availability is allowed to be lost, that is, the core is guaranteed to be available. When e-commerce promotes, in order to cope with the surge in traffic, some users may be directed to the downgrade page, and the service layer may only provide downgrade services. This is the manifestation of the loss of partial availability.

Soft state (Soft State): Soft state refers to allowing the system to have an intermediate state, and the intermediate state will not affect the overall availability of the system. In distributed storage, there are generally at least three copies of a piece of data. The delay that allows the synchronization of copies between different nodes is a manifestation of soft state. Asynchronous replication of mysql replication is also a manifestation.

Eventual Consistency: Eventual consistency means that all data copies in the system can finally reach a consistent state after a certain period of time. Weak consistency is the opposite of strong consistency. Ultimate consistency is a special case of weak consistency. How to resolve consistency in daily life

The leader said: "At 3-4 o'clock this afternoon, we will have a meeting in the large conference room to make future plans. Please reply when you receive it."

The colleague who saw the notification responded "1"

The leader will look at how many people responded to his notice. If there are too few people who responded, he may notify again. If there are more people who responded, he will be relieved. He will not count. Not everyone responded

Wait until 3 pm, all colleagues will make future plans together

The department is a server cluster, in which there are two roles: leader and colleague

• Leaders are elected by colleagues

• The leader is responsible for receiving requests from the outside, and then synchronizing the information to colleagues during the meeting

• The leader needs to determine whether his colleagues are free to meet, so ask all colleagues in the group first, and count the replies of colleagues, but the leader does not need to receive replies from everyone, only a few people’s replies are enough.

• Make sure that most of the colleagues are free, you can confirm the meeting at 3 o'clock and synchronize the information

Elements to ensure consistency

Leadership: Leader election mechanism

Two-phase commit:

Over half verification mechanism:

Cluster node role

• Leader: Leader

• Follower: Follower

• Observer: Observer

Election scene in real life

• 1. Who do I have a good relationship with, who do I vote for

• 2. Who I think is great, who I vote for

• 3. I can update my vote. After discussing with others, I find that the person who other people voted is better than the one I voted now. I modify my vote and vote even more.

• 4. Count votes from the ballot box, and the leader who gets the most votes

Node where the leader election takes place

• Cluster startup

• Leader hangs up

• After the follower hangs up, the leader finds that there are no more than half of them

Follower has followed itself-can't provide external services anymore (leader election)

Leader election

The process of leader election is actually to compare which server is stronger. The comparison rules are: 1. Whose data is newer and who is the leader (zxid); 2. If the data is the same, it depends on whose server Id (myid) is larger. Who is the leader; this process is carried out through mutual voting among various servers, each server will receive votes from other servers, and the voting information will contain the two information zxid and myid mentioned above, and then perform PK, election Who is stronger, and the weaker party in the PK changes its vote to the party that just won the PK. So according to this rule, each server will have a person who he thinks is the strongest, and in the whole During the voting process, there will be a ballot box inside each server. The ballot box stores who other servers are currently voting for, so each server can see if there are more than half of the servers and The strongest player I currently vote for is the same server. If it exceeds, I think that the leader is selected (the strongest player I currently voted is the Leader). If I find that I am the strongest player, I will lead. If If you are not, follow (Follower).

Write data flow

Meet CP

ZK will put the received requests in a queue in advance, and then use a single thread to take out the requests from the queue for processing. For example, if two write operations are queued at the same time, then the first write operation is in the process of being processed , The second write operation needs to wait for the first write operation to be processed before it is processed. For the second write operation, the cluster is temporarily unavailable. The main reason for the second unavailability is the first write operation. In order to keep the data in the cluster consistent, a two-phase commit operation is in progress.

Can write requests be processed during leader election?

The reason for split-brain is that some servers lose connection with the leader, and this part of the servers can be connected to each other, so this part of the server will be re-elected. If a Leader is re-elected, then the entire cluster will have two Leader, this is split brain.

The leader election in Zookeeper needs to receive more than half of the server's votes, so if there is a split brain, the number of nodes on the server is not enough, so the verification of more than half of the mechanism avoids the split brain.

ZAB Agreement

• Leader election

• Data synchronization (recovery phase)

• Receive request (two-phase submission)

Video connection during the same period: https://www.bilibili.com/video/av73279922

This article is published by OpenWrite , a multi- posting platform for blogs !

Guess you like

Origin blog.csdn.net/yueyunyin/article/details/103755099