The concept and foundation of zookeeper

1.1 The mission of ZooKeeper

 

When developers use ZooKeeper for development, those applications designed by developers can often be regarded as clients connected to the ZooKeeper server in groups, and they connect to the ZooKeeper server through the ZooKeeper client API to perform corresponding operations. Zookeep's client API is powerful, its

Included:

  1.  Guarantees strong consistency, order, and durability.
  2. The ability to implement generic synchronization primitives.
  3. · In real distributed systems, concurrency often leads to incorrect behavior. ZooKeeper provides a simple concurrency mechanism. 

 

 1.1.1 Building a distributed system through ZooKeeper

  There are many definitions of a distributed system, but for the purposes of this book, we define a distributed system as: A distributed system is a system of multiple software components running independently across multiple physical hosts at the same time. There are many reasons why we design systems in a distributed way, which can take advantage of the computing power of multiple processors to run components, such as parallel replication tasks. Multiple systems may need to be distributed in different locations for strategic reasons, such as multiple applications being served by servers at multiple locations.

  

  There are two options for process communication in a distributed system: exchange information directly over the network, or read and write to some shared storage. ZooKeeper uses a shared storage model to implement collaboration and synchronization primitives between applications. For the shared storage itself, network communication is required between the process and the storage. I

We emphasize the importance of network communication, as it is the foundation of concurrent design in distributed systems.

 

In a real system, we need to pay special attention to the following issues:

message delay

 

Message transfers may be sent with arbitrary delays, for example, because of network congestion. This arbitrary delay can lead to unpredictable consequences. For example, according to the base clock, process P sends a message first, and then another process Q sends a message, but process Q's message may complete transmission first.

deliver.

 

processor performance

Scheduling and overloading of the operating system can also cause arbitrary delays in message processing. When a process sends a message to another process, the delay time of the entire message is approximately equal to the sum of the time consumed by the sender, the transmission time, and the processing time of the receiver. If the sending or receiving process needs to be scheduled

Time to process, message delay will be higher

 

clock skew

It is not uncommon for systems to use the concept of time, for example, to determine what events occurred in the system at a certain time. Processor clocks are unreliable, and arbitrary skew occurs between them. Therefore, relying on the processor clock may lead to wrong decisions.

 

  The precise design of ZooKeeper simplifies the handling of these problems. ZooKeeper does not completely eliminate these problems, but makes these problems completely transparent at the application service level, making these problems easier to deal with. ZooKeeper implements solutions to important distributed computing problems, intuitively providing developers with some degree of encapsulation, at least that's what we've been hoping for.

1.2 Example: Master-Slave Application

We've covered distributed systems in theory, now, it's time to make it a little more concrete. Consider an architecture that is widely used in distributed system design: a master-worker architecture (Figure 1-1)

 

 Generally in this architecture, the master node process is responsible for tracking the status of slave nodes and the validity of tasks, and assigning tasks to slave nodes. For ZooKeeper, this architectural style is representative of the most popular tasks such as electing a master, keeping track of valid slaves, and maintaining application metadata.

 

 

To implement a master-slave system, we must address three key issues:

masternode crashes

If the master node sends an error and fails, the system will not be able to assign new tasks or reassign tasks that have failed.

slave node crashes

If the slave node crashes, the assigned tasks cannot be completed.

Communication failure

If there is no information exchange between the master node and the slave node, the slave node will not be able to know that new tasks are assigned to it.

 

1.2.1 Master node failure

  When the primary node fails, we need to have a backup master. When the primary master crashes, the backup master takes over the role of the primary and fails over, however, it is not a simple matter of starting to handle incoming requests to the primary. The new primary master node needs to be able to recover to the state when the old primary master node crashed. For the recoverability of the master node state , we cannot rely on obtaining this information from the crashed master node, but need to obtain it from other places, that is, through ZooKeeper.

  State restoration is not the only important issue. If the primary node is valid, the backup primary node thinks the primary node has crashed. This false assumption can occur when, for example, the primary is heavily loaded , causing messages to be delayed arbitrarily (see Section 1.1.4 for this), and the backup primary will take over as the primary, executing All required programs may eventually start executing in the role of the master node, becoming the second primary master node. To make matters worse, if these slave nodes are unable to communicate with the primary master, such as due to a network partition error, these slave nodes may stop communicating with the primary master and communicate with the second primary master. Establish a master- slave relationship. The problem caused by this scenario is commonly referred to as split- brain: two or more parts of the system begin to work independently, resulting in inconsistent . We need to find a way to deal with the failure of the master node, the key is that we need to avoid a split-brain situation

 

1.2.2 Slave node failure

  The client submits the task to the master node, and then the master node dispatches the task to the valid slave nodes . The slave node receives the dispatched tasks, and reports the execution status . The master node will notify the client of the execution result in the next step.

  If a slave node crashes, all unfinished tasks that have been dispatched to this slave node need to be re-dispatched. The first requirement is for the master node to have the ability to detect the crash of the slave node. The master node must be able to detect the crash of the slave nodes and determine which slave nodes are valid in order to dispatch the tasks of the crashed node. When a slave node crashes, the slave node may perform some tasks, or all of them, but no results are reported. If the whole operation has other effects, we also need to perform some recovery process to clear the previous state.

 1.2.3 Communication failure

  If a slave node loses its network connection to the master node, such as due to a network partition, reassigning a task may cause two slave nodes to perform the same task . If a task is allowed to be executed multiple times, we can perform task redistribution without verifying whether the first slave node has completed the task. If a task is not allowed, then our application needs to accommodate the possibility of multiple slave nodes performing the same task.

  Another important issue caused by communication failures is the impact on synchronization primitives such as locks. Because nodes can crash and the system can have network partitions, locking mechanisms can also prevent tasks from continuing. So ZooKeeper also needs to implement mechanisms to handle these situations. First, the client can tell ZooKeeper that the status of some data is ephemeral; secondly , ZooKeeper needs the client to periodically send a notification of whether it is alive or not. If a client fails to send a notification in time, all subordinates to the client The data in the temporary state of the terminal will all be deleted. Through these two mechanisms, in the event of a crash or communication failure, we can prevent application downtime caused by the client's independent operation.  

  Recall from what was discussed earlier, if we can't control message latency in the system, we can't be sure if a client is crashing or running slowly, so when we guess a client has crashed, we also need to assume that the client is only It is slow to execute, and it may perform some other operations later.

 

 

1.2.4 Task Summary
Based on the above description, we can get the following requirements for the master-slave architecture:
Master node election
This is a key step, so that the master node can assign tasks to the slave nodes.
Crash Detection
The master node must have the ability to detect slave node crashes or loss of connection.
The group membership management
master must have the ability to know which slaves can perform tasks.
Metadata management
master and slave nodes must have
the ability to save allocation and execution state in some reliable way.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325506173&siteId=291194637