Zookeeper and Chubby [distributed coordination system]

foreword

Large-scale distributed systems need to address various types of coordination needs:

  • When a new process or server joins in the cluster, how to detect its joining? How can I get configuration parameters automatically?

  • How can other machines in the entire cluster be notified in real time when configuration information is changed by a process or server?

  • How to judge whether a machine in the cluster is still alive?

  • How to elect the master server, the master server is down, how to elect a new master server from the alternate servers?

The essence of the above problems is the problem of coordination and management in distributed systems. Currently, the more famous coordination systems include Google's Chubby and Yahoo's Zookeeper (for the coordination system, the client is often a distributed cluster).

Chubby

Chubby provides coarse-grained locking services. A Chubby service unit can provide cooperative management services for about 10,000 4-core CPU machines. Its main function is to achieve synchronization between clusters and to reach a consensus on the environment and resources of the entire system.

Chubby is a lock service. Machines in a distributed cluster compete for data locks to become leaders. The server that obtains the lock writes its own information to the data, making it visible to other competitors. The coarse-grained service it provides means that the lock is held for a long time. Chubby will allow the server that grabs the lock to act as the leader for a few hours or even days. Chubby emphasizes system reliability and high availability, etc., rather than processing high throughput and storing large amounts of data in a coordinated system. Its theoretical basis is Paxos, the recognition of consensus on a decision by communicating and voting with each other.

system structure

A data center generally deploys a set of Chubby units. Each set of Chubby units consists of 5 servers. One master server and four backup servers are generated through Paxos. The data saved by the backup server is exactly the same as that of the master server. In the case of downtime, quickly select one of the backup servers as the master server to ensure normal services and improve the availability of the entire system.

During the term of the master server (a few seconds), the backup server will not vote for other servers to elect a new master server. After the term expires, if there is no failure during the term, the system will appoint or remove the original master server to continue to hold the position. If the backup server fails, the system will start a new machine to replace the failed machine and update the DNS, while the master server will periodically check the DNS, and once it finds that the DNS has changed, it will notify other backup servers in the cluster.

The client uses RPC communication to interact with the server, and the read and write operations to Chubby are completed by the master server. The backup server only synchronizes the data in the master server to ensure that their data is consistent with the master server. If the backup servers receive read and write requests from clients (distributed clusters), they tell the client the address of the master server and forward the request to the master server.

Chubby mainly stores some management information and basic data. Its purpose is not to store data but to manage the synchronization of resources. It is not recommended to store a large amount of data in Chubby. At the same time, Chubby provides a subscription mechanism, that is, the client can subscribe to some data stored in Chubby. Once the data changes, Chubby will notify the client. For example, a configuration file of a distributed cluster is stored on Chubby, and all machines in the cluster subscribe to this configuration file. Once the configuration file is changed, all nodes will receive a message and make changes according to the configuration file.

About caching

Chubby allows clients to cache some data locally, and most data clients can request in the local cache, which reduces the request response time on the one hand, and reduces the pressure on Chubby on the other hand. Chubby is responsible for maintaining the consistency of data on the cache and the master server by maintaining a cache table. When the master server receives a request to modify a certain data, it will temporarily block the request and notify all clients that have cached the data. After the client receives the message, it will send a confirmation message to Chubby. The modification of the data will not proceed until confirmation from all relevant clients has been received.

Zookeeper

Zookeeper is an open source scalable high-throughput distributed coordination system with a wide range of application scenarios.
The Zookeeper cluster service consists of multiple servers (minimum 3), one master server and multiple slave servers are elected through election. The client can read the required data from any server, but to write or modify the data must be executed on the master server. If the client connects to the slave server to write or update data, the slave server will forward the request to the master server. The server, the master server completes the writing or updating of data, and notifies all the slave servers of the corresponding information. The slave server updates its own data accordingly, and sends confirmation information to the master service. The master server receives half or more of the slave servers. After the server confirms the information, it notifies the client that the write or update operation is successful. The Zookeeper cluster generally consists of 2n+1 (odd) servers, which can tolerate the failure of n slave servers at most. Zookeeper achieves its fault tolerance by periodically saving snapshot information and log information.

Zookeeper data consistency problem

Any of Zookeeper's slave servers can also provide read services to clients, which is the main reason for its high throughput, but on the other hand, it also leads to a problem: clients may read outdated data. That is, when the client reads data from the slave server, the master server has already modified the data, and it is too late to notify the slave server of the modified data. Zookeeper provides a synchronization operation to solve this problem. The client needs to call the synchronization method when reading data from the server. The slave server that receives the synchronization command will initiate a data synchronization request to the master server to ensure that the client is in the slave server. The data read on the server is consistent with the master server.

Zookeeper data model

The in-memory data models of Zookeeper and Chubby are similar to traditional file systems, consisting of a tree-like hierarchical directory structure, where the nodes are called Znodes, which can be files or directories. Generally, it is necessary to complete the reading and writing of small data as a whole, and the reason is to avoid being used as a distributed storage system to store large data (the same is true for Chubby).

The node types of Zookeeper and Chubby are also the same, which are divided into persistent and temporary. Ephemeral nodes are deleted when the client ends the request or fails. Persistent nodes can only delete data from the Zookeeper server by explicitly performing the delete.

The client can set the observation flag on the node. When the node changes, zookeeper will notify the client. This property is crucial for many services provided by zookeeper.

Typical application scenarios of Zookeeper

  • Election of leaders

When Zookeeper elects a leader, it creates a temporary node that stores information about the leader server. Others will read the node's information, so that the entire cluster knows who is the leader, and set the watch flag on the temporary node. If the server does not read the data of the temporary node, it means that there is no leader in the cluster at this time, everyone is equal, and everyone competes for leadership until a certain server information is written to the temporary node, indicating the generation of the leader. Because the watch flag is set, all the younger brothers will receive information on who the new leader is. If a new server joins the cluster, it will read the information of the temporary node, and Ma Shan can know who is the leader.

  • Configuration management

Store the configuration files and other information of the upper-level cluster in a znode of the Zookeeper cluster. All nodes in the cluster read the configuration information in the znode and set the observation flag. If the configuration information changes in the future, all nodes in the cluster will receive the message and make corresponding changes in time.

  • Cluster member management

If a new person arrives in the upper-level cluster or someone is leaving, it must be known in time. The Zookeeper service can set a temporary node as a cluster, and members in the cluster serve as temporary nodes under the temporary node. The client sets observation flags for these temporary nodes. Once a new machine joins, or a machine fails and exits, it will immediately receive a notification message from zookeeper. In this way, the dynamic management of cluster members is completed.

  • Task Assignment

The client creates a task node in Zookeeper, and sets the observation flag on the node. Once a new task arrives, it creates a child node under the node and informs the upper cluster of the information. The upper-level cluster creates a worker node in the Zookeeper cluster, which has multiple machine nodes, and sets observation flags for these machine nodes. After the upper-layer cluster receives the task request, it selects a machine according to the busyness of the working machine, and creates a task node under the machine node. After the working machine finds this task, it indicates that there is a new task assigned to it, and then executes the task. After the task is executed, the worker machine deletes the task node under its own name. At the same time, the corresponding sub-nodes under the task node will be deleted, indicating that the task execution is completed. The client has been monitoring this task node and will immediately know that the task has been completed.

Similarities and differences between Zookeeper and Chubby

Same point:

  • Both have the same data model, both are tree-shaped hierarchical directory structures, similar to traditional file systems
  • The nodes of the two are the same, and they are divided into temporary nodes and persistent nodes.
  • Chubby's subscription is similar to Zookeeper's watch flag
  • Write or update data operations need to be done on the master server

difference:

  • Chubby emphasizes system reliability and high availability, and does not pursue high throughput; Zookeeper can handle high throughput.
  • In Chubby, only the master node can provide read data services, and the slave nodes in Zookeeper can also provide read data services.
  • On the consensus protocol, Chubby uses PAXOS and Zookeeper uses ZAB
  • Chubby's master node has a lease. It is no problem to continue to renew the lease when it expires. Zookeeper will always be whoever is the master node. Unless it is artificially changed or a failure occurs, there is no concept of lease.

Author: py xiaojie

Blog address: http://www.cnblogs.com/52mm/

This article is welcome to reprint, but this statement must be retained without the author's consent, and a link to the original text should be given in an obvious position on the article page

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324998450&siteId=291194637