Distributed cluster framework - zookeeper required interview questions ①

1. What is ZooKeeper?

ZooKeeper is an open source distributed coordination service. It is a software that provides consistent services for distributed applications. Distributed applications can implement data publishing/subscription, load balancing, naming services, distributed coordination/notification, cluster management, Master election, distributed locks and Distributed queue and other functions.

The goal of ZooKeeper is to encapsulate complex and error-prone key services, and provide users with a simple and easy-to-use interface and a system with high performance and stable functions.

Zookeeper guarantees the following distributed consistency properties:

  1. sequential consistency
  2. atomicity
  3. single view
  4. reliability
  5. Real-time (eventual consistency)

The client's read request can be processed by any machine in the cluster. If the read request registers a listener on the node, the listener is also processed by the connected zookeeper machine. For write requests, these requests will be sent to other zookeeper machines at the same time and the request will return success only after reaching a consensus. Therefore, as the number of zookeeper cluster machines increases, the throughput of read requests will increase but the throughput of write requests will decrease.

Ordering is a very important feature in zookeeper. All updates are globally ordered, and each update has a unique timestamp. This timestamp is called zxid (Zookeeper Transaction Id). The read request will only be ordered relative to the update, that is, the return result of the read request will contain the latest zxid of the zookeeper.

2. What does ZooKeeper provide?    

File system, notification mechanism

3. Zookeep file system

Zookeeper provides a multi-level node namespace (nodes are called znodes). Different from the file system, these nodes can set associated data, and in the file system, only file nodes can store data, but directory nodes cannot.

In order to ensure high throughput and low latency, Zookeeper maintains this tree-like directory structure in memory. This feature prevents Zookeeper from being used to store large amounts of data. The upper limit of data stored for each node is 1M.

4. How does Zookeeper ensure the state synchronization of the master and slave nodes?

The core of Zookeeper is the atomic broadcast mechanism, which ensures the synchronization between servers. The protocol that implements this mechanism is called the Zab protocol. The Zab protocol has two modes, which are recovery mode and broadcast mode.

1) Recovery Mode

When the service starts or after the leader crashes, Zab enters the recovery mode. When the leader is elected and most servers complete the state synchronization with the leader, the recovery mode ends. State synchronization ensures that the leader and server have the same system state.

2) Broadcast mode

Once the leader has synchronized its state with most followers, it can start broadcasting messages, that is, it enters the broadcast state. At this time, when a server joins the ZooKeeper service, it will start in recovery mode, discover the leader, and synchronize its state with the leader. When the synchronization is over, it also participates in message broadcasting. The ZooKeeper service has been maintained in the Broadcast state until the leader crashes or the leader loses most of the followers support.

5. Four types of data nodes Znode

1) PERSISTENT - persistent node

Nodes always exist on Zookeeper unless manually deleted

2) EPHEMERAL - temporary node

The life cycle of the temporary node is bound to the client session. Once the client session expires (the connection between the client and zookeeper is not necessarily invalid), all the temporary nodes created by the client will be removed.

3) PERSISTENT_SEQUENTIAL - persistent sequential node

The basic features are the same as persistent nodes, except that the order attribute is added, and an auto-incrementing integer number maintained by the parent node will be appended after the node name.

4) EPHEMERAL_SEQUENTIAL - temporary sequential node

The basic features are the same as the temporary node, but the sequence attribute is added, and a self-incrementing integer number maintained by the parent node will be appended after the node name.

6. Zookeeper Watcher mechanism – data change notification

Zookeeper allows the client to register a Watcher with a Znode on the server. When some specified events on the server trigger the Watcher, the server will send an event notification to the specified client to implement the distributed notification function, and then the client will follow the Watcher Notify state and event types to make business changes.

Working Mechanism:

  1. Client register watcher
  2. Server processing watcher
  3. Client callback watcher

Summary of Watcher features:

1) Disposable

Whether it is a server or a client, once a Watcher is triggered, Zookeeper will remove it from the corresponding storage. This design effectively reduces the pressure on the server. Otherwise, for nodes that are updated very frequently, the server will continuously send event notifications to the client, which puts a lot of pressure on both the network and the server.

2) The process of the client serially executing the client Watcher callback is a serial synchronization process.

  1. Lightweight
    1. The Watcher notification is very simple, it only tells the client that an event has occurred, and does not explain the specific content of the event.
    2. When the client registers Watcher with the server, it does not pass the real Watcher object entity of the client to the server, but only marks it with the boolean type attribute in the client request.
    3. The watcher event is sent asynchronously. The notification event of the watcher is sent from the server to the client asynchronously. This has a problem. Different clients and servers communicate The event is listened to, due to

Zookeeper itself provides an ordering guarantee, that is, after the client listens to the event, it will perceive that the znode it monitors has changed. So we can't expect to be able to monitor every change of nodes when using Zookeeper. Zookeeper can only guarantee final consistency, but not strong consistency.

  1. Register watcher getData, exists, getChildren
  2. Trigger watcher create, delete, setData
  3. When a client connects to a new server, the watch will be fired with any session event. Watches cannot be received when the connection to a server is lost. And when the client reconnects, all previously registered watches will be re-registered, if necessary. Usually this is completely transparent. There is only one special case where a watch can be lost: for an

If an exist watch for a znode is created while the client is disconnected, and subsequently deleted before the client connects, the watch event may be lost.

7. Realization of client registration Watcher

  1. Call the three APIs getData()/getChildren()/exist() and pass in the Watcher object
  2. Mark request request, encapsulate Watcher to WatchRegistration
  3. Encapsulate it into a Packet object, and send the request to the server
  4. After receiving the response from the server, register Watcher to ZKWatcherManager for management
  5. The request returns to complete the registration.

8. The server handles the implementation of Watcher

1) The server receives the Watcher and stores it

Receive the client request, process the request and judge whether to register Watcher, if necessary, compare the node path of the data node with ServerCnxn (ServerCnxn represents a connection between the client and the server, implements the process interface of Watcher, and can be regarded as a Watcher object at this time ) are stored in WatchTable and watch2Paths of WatcherManager.

2) Watcher trigger

Take the example where the server receives a setData() transaction request and triggers the NodeDataChanged event:

2.1) Encapsulate WatchedEvent

Encapsulate the notification status (SyncConnected), event type (NodeDataChanged) and node path into a WatchedEvent object

​​​​​​​​2.2) Query Watcher

Find Watcher from WatchTable according to node path

  1. Not found; indicating that no client has registered Watcher on the data node
    1. Find; extract and delete corresponding Watcher from WatchTable and Watch2Paths (from

It can be seen here that Watcher is one-time on the server side, and it will be invalid when triggered once)

3) Call the process method to trigger Watcher

The process here is mainly to send Watcher event notifications through the TCP connection corresponding to ServerCnxn.

9. The client callback Watcher

The client SendThread thread receives the event notification, and the EventThread thread calls back the Watcher.

The client's Watcher mechanism is also one-time, once triggered, the Watcher will be invalid.

10. ACL permission control mechanism

UGO(User/Group/Others)

It is currently used in the Linux/Unix file system and is also the most widely used permission control method. It is a coarse-grained file system permission control mode.

ACL (Access Control List) access control list includes three aspects:

Permission Mode (Scheme)

  1. IP: Permission control from IP address granularity
  2. Digest: the most commonly used, using permission identifiers similar to username:password for permission configuration, which is convenient for distinguishing different applications for permission control
  3. World: The most open permission control method, which is a special digest mode with only one permission identifier "world:anyone"
  4. Super: super user authorization object

Authorization object refers to the user to whom the authorization is granted or a specified entity, such as an IP address or a machine light.

Permission

  1. CREATE: Data node creation permission, allowing authorized objects to create child nodes under this Znode
  2. DELETE: Child node deletion permission, allowing the authorized object to delete the child nodes of the data node
  3. READ: The read permission of the data node, allowing the authorized object to access the data node and read its data content or child node list, etc.
  4. WRITE: Data node update permission, allowing authorized objects to update the data node
  5. ADMIN: Data node management authority, allowing authorized objects to perform ACL related setting operations on the data node

11. Chroot feature

After version 3.2.0, the Chroot feature was added, which allows each client to set a namespace for itself. If a client is set with Chroot, any operation of the client on the server will be limited to its own namespace.

By setting Chroot, a client can be applied to correspond to a subtree of the Zookeeper server. In the scenario where multiple applications share a Zookeeper into the group, it is very helpful to achieve mutual isolation between different applications.

12. Session management

Bucketing strategy: manage similar sessions in the same block, so that Zookeeper can isolate sessions in different blocks and process them uniformly in the same block. Allocation principle: the "next timeout time point" (ExpirationTime) of each session

Calculation formula:

ExpirationTime_ = currentTime + sessionTimeout

ExpirationTime = (ExpirationTime_ / ExpirationInrerval + 1) *

ExpirationInterval , ExpirationInterval refers to the Zookeeper session timeout check interval, the default tickTime

13. Server role

Leader

  1. The only scheduler and processor of transaction requests to ensure the order of cluster transaction processing
  2. The scheduler of each service in the cluster

Follower

  1. Process the client's non-transactional request and forward the transactional request to the Leader server
  2. Participate in transactions to request Proposal votes
  3. Participate in Leader election voting

Observer

1) A server role introduced after version 3.0, on the basis of not affecting the transaction processing ability of the cluster

Improve the non-transactional processing capability of the cluster

2) Process the client's non-transactional request and forward the transactional request to the Leader server

3) Do not participate in any form of voting

14. Server working status under Zookeeper

The server has four states, namely LOOKING, FOLLOWING, LEADING, and OBSERVING.

1) LOOKING: Find the Leader status. When the server is in this state, it will think that there is no Leader in the current cluster, so it needs to enter the Leader election state.

2) FOLLOWING: follower status. Indicates that the current server role is Follower.

3) LEADING: leader status. Indicates that the current server role is Leader.

4) OBSERVING: Observer state. Indicates that the current server role is Observer.

 

Guess you like

Origin blog.csdn.net/qq_53142796/article/details/132610559