Big Data Offline Phase 01: Apache Zookeeper

1. Basic knowledge of Zookeeper

ZooKeeper overview

Zookeeper is an open source framework for distributed coordination services. It is mainly used to solve the consistency problem of the application system in the distributed cluster.

ZooKeeper is essentially a distributed small file storage system . Provides data storage based on a directory tree similar to a file system, and can effectively manage nodes in the tree. It is used to maintain and monitor the state changes of your stored data. By monitoring the changes of these data states, data-based cluster management can be achieved.

ZooKeeper characteristics

  1. Global data consistency : each server in the cluster saves the same copy of data, no matter which server the client connects to, the displayed data is consistent, which is the most important feature;
  2. Reliability: If a message is accepted by one of the servers, it will be accepted by all of them.
  3. Sequence: including global order and partial order: global order means that if message a is published before message b on one server, message a will be published before message b on all servers; partial order It means that if a message b is published by the same sender after message a, a must be in front of b.
  4. Data update atomicity: a data update either succeeds (more than half of the nodes succeed), or fails, and there is no intermediate state;
  5. Real-time: Zookeeper guarantees that the client will obtain server update information or server failure information within a time interval.

ZooKeeper cluster roles

Leader:

The core of Zookeeper cluster work

The only scheduler and processor of transaction requests (write operations), ensuring the order of cluster transaction processing;

The scheduler of each server in the cluster.

Requests for write operations such as create, setData, and delete need to be forwarded to the leader for processing. The leader needs to determine the number and execute the operation. This process is called a transaction.

Follower:

Process client non-transactional (read operation) requests and forward transactional requests to Leader;

Participate in cluster leader election voting.

In addition, for zookeeper clusters with relatively large visits, an observer role can also be added.

Observer:

Observer role, observes the latest state changes of the Zookeeper cluster and synchronizes these states. It can independently process non-transactional requests, and forward transactional requests to the Leader server for processing.

It will not participate in any form of voting and only provides non-transactional services, which are usually used to improve the cluster's non-transactional processing capabilities without affecting the cluster's transactional processing capabilities.


ZooKeeper cluster construction

Zookeeper cluster construction refers to ZooKeeper distributed mode installation. It usually consists of 2n+1 servers. This is because the number of ZooKeeper clusters is generally an odd number to ensure that the Leader election (based on the implementation of the Paxos algorithm) can be supported by a majority.

Zookeeper requires a java environment to run, so jdk needs to be installed in advance. For a cluster with leader+follower mode installed, the general process is as follows:

  • Configure host name to IP address mapping configuration
  • Modify the ZooKeeper configuration file
  • Remotely copy distribution installation files
  • set myid
  • Start the ZooKeeper cluster

If you want to use the Observer mode, you can add the following configuration to the configuration file of the corresponding node:

peerType=observer

Second, you must specify which nodes are designated as Observers in the configuration file, such as:

server.1:node1:2181:3181:observer

For detailed steps, please refer to the attached installation documentation.


2. ZooKeeper shell

client connection

Run zkCli.sh –server ip to enter the command line tool.

Enter help to output the zk shell prompt:

Basic operation of the shell

create node

create [-s] [-e] path data acl

Among them, -s or -e respectively specify the node characteristics, sequential or temporary nodes, if not specified, it means a persistent node; acl is used for permission control.

Create sequence nodes:

Create a temporary node:

Create a permanent node:

read node

The commands related to reading include the ls command and the get command. The ls command can list all child nodes under the specified node of Zookeeper, and can only view all child nodes of the first level under the specified node; the get command can obtain the child nodes of the specified node of Zookeeper. Data content and attribute information.

  ls path [watch]

  get path [watch]

  ls2 path [watch]

update node

set path data [version]

data is the new content to be updated, and version represents the data version.

Now dataVersion has changed to 1, indicating that it has been updated.

delete node

delete path [version]

If the deleted node has child nodes, the node cannot be deleted, and the child node must be deleted first, and then the parent node must be deleted.

Rmr path

Nodes can be removed recursively.

quota

setquota -n|-b val path Increase limit on nodes.

n: Indicates the maximum number of child nodes

b: Indicates the maximum length of the data value

val: the maximum number of child nodes or the maximum length of data values

path: node path

listquota path lists the quota of the specified node

The number of child nodes is 2, and the data length -1 means no limit

delquota [-n|-b] path delete quota

other commands

history : list command history

redo: This command can re-execute the historical command of the specified command number, and the command number can be viewed through history


3. ZooKeeper data model

ZooKeeper's data model is very similar in structure to that of the standard file system. It has a hierarchical namespace and adopts a tree hierarchy. Each node in the ZooKeeper tree is called a Znode. Like a file system's directory tree, each node in a ZooKeeper tree can have children. But there are also differences:

  1. Znode has both the characteristics of files and directories . It not only maintains data structures such as data, meta information, ACL, and timestamp like a file, but also can be used as a part of the path identifier like a directory, and can have sub-Znodes. Users have operations such as adding, deleting, modifying, and checking Znodes (with permission).
  2. Znode has an atomic operation, the read operation will get all the data related to the node, and the write operation will also replace all the data of the node. In addition, each node has its own ACL (Access Control List), which specifies the permissions of the user, that is, limits the operations that a specific user can perform on the target node.
  3. Znode storage data size is limited. Although ZooKeeper can associate some data, it is not designed as a conventional database or big data storage. On the contrary, it is used to manage scheduling data, such as configuration file information, status information, collection location, etc. in distributed applications. The common feature of these data is that they are very small data, usually in KB. Both ZooKeeper's server and client are designed to strictly check and limit the data size of each Znode to at most 1M, which should be much smaller than this value in normal use.
  4. Znodes are referenced by paths, like file paths in Unix. Paths must be absolute, so they must begin with a slash character. In addition, they must be unique, meaning that each path has only one representation, so these paths cannot be changed. In ZooKeeper, paths consist of Unicode strings, with some restrictions. The string "/zookeeper" is used to save management information, such as key quota information.

Data Structure Diagram

Each node in the graph is called a Znode. Each Znode consists of 3 parts:

① stat: This is status information, describing the Znode version, permissions and other information

② data: the data associated with the Znode

③ children: child nodes under the Znode


node type

There are two types of Znodes, namely temporary nodes and permanent nodes.

A node's type is determined when it is created and cannot be changed.

Ephemeral nodes: The lifetime of the nodes depends on the session that created them. Once the session ends, the temporary node will be automatically deleted, of course, it can also be manually deleted. Ephemeral nodes are not allowed to have child nodes. Permanent nodes: The life cycle of the nodes does not depend on the session, and they can only be deleted when the client explicitly performs the delete operation.

Znode also has a serialization feature. If it is specified when it is created, a continuously increasing serial number will be automatically appended to the name of the Znode. The sequence number is unique to the parent node of this node, so that the order in which each child node was created will be recorded. Its format is "%10d" (10 digits, the digits without value are supplemented with 0, such as "0000000001").

In this way, there will be four types of Znode nodes, corresponding to:

PERSISTENT: permanent node

EPHEMERAL: ephemeral nodes

PERSISTENT_SEQUENTIAL: permanent node, serialization

EPHEMERAL_SEQUENTIAL: ephemeral nodes, serialization


node properties

Each znode contains a series of attributes, and the attributes of the node can be obtained through the command get.

dataVersion: data version number, every time a set operation is performed on a node, the value of dataVersion will increase by 1 (even if the same data is set), which can effectively avoid the sequence problem that occurs when data is updated.

cversion : The version number of the child node. When the child node of znode changes, the value of cversion will increase by 1.

cZxid : The transaction id created by Znode.

mZxid: The transaction id of the Znode being modified, that is, mZxid will be updated every time a znode is modified.

For zk, each change will generate a unique transaction id, zxid (ZooKeeper Transaction Id). Through zxid, you can determine the sequence of update operations. For example, if zxid1 is smaller than zxid2, it means that zxid1 operation occurs before zxid2, and zxid is unique for the entire zk, even if the operation is a different znode.

ctime: The timestamp when the node was created.

mtime: The timestamp when the latest update of the node occurred.

ephemeralOwner: If the node is a temporary node, the ephemeralOwner value indicates the session id bound to the node. If not, the ephemeralOwner value is 0.

Before the client and server communicate, a connection needs to be established first, which is called a session. After the connection is established, if a connection timeout occurs, authorization fails, or the connection is explicitly closed, the connection will be in the CLOSED state, and the session will end at this time.


4. ZooKeeper Watcher (monitoring mechanism)

ZooKeeper provides distributed data publishing/subscribing functions. A typical publishing/subscribing model system defines a one-to-many subscription relationship, allowing multiple subscribers to monitor a topic object at the same time. When the topic object itself changes , all subscribers are notified so that they can act accordingly.

In ZooKeeper, the Watcher mechanism is introduced to realize this distributed notification function. ZooKeeper allows the client to register a Watcher with the server. When some events on the server trigger the Watcher, an event notification will be sent to the specified client to implement the distributed notification function.

There are many types of trigger events, such as: node creation, node deletion, node change, child node change, etc.

In general, Watcher can be summarized as the following three processes: the client registers Watcher with the server, the server event triggers Watcher, and the client calls back Watcher to get the trigger event

Features of Watch Mechanism

one-time trigger

When an event triggers monitoring, a watcher event will be sent to the client that set up the monitoring. This effect is one-time. If the same event occurs again later, it will not be triggered again.

Event encapsulation

ZooKeeper uses WatchedEvent objects to encapsulate server-side events and deliver them.

WatchedEvent contains three basic properties of each event:

Notification state (keeperState), event type (EventType) and node path (path)

event sent asynchronously

The watcher's notification event is sent from the server to the client asynchronously.

Register before triggering

For the watch mechanism in Zookeeper, the client must first go to the server to register for monitoring, so that the event sending will trigger the monitoring and notify the client.


Notification status and event type

The same event type has different meanings in different notification states. The following table lists common notification states and event types.

Among them, the connection status event (type=None, path=null) does not need to be registered by the client, as long as the client needs to process it directly.


Shell client setting watcher

Set node data change monitoring:

Change node data via another client:

At this time, the node that is set to listen receives the notification:


5. Typical applications of Zookeeper

Data publish/subscribe

The data publishing/subscribing system is the so-called configuration center, that is, the publisher publishes data to a node of ZooKeeper, and provides subscribers with data subscription, so as to achieve the purpose of dynamically updating data, and realize centralized management of configuration information and data collection. dynamic updates.

ZooKeeper adopts a combination of push and pull: the client registers the node that it needs to pay attention to with the server. Once the data of the node changes, the server will send a Watcher event notification to the corresponding client, and the client receives the message notification. After that, you need to take the initiative to obtain the latest data from the server.

Mainly used: monitoring mechanism.


Provide cluster election

In a distributed environment, whether it is a master-slave architecture cluster or an active-standby architecture cluster, it is required to have and have a normal external service during service, which we call master.

When the master fails, a new master needs to be re-elected. Ensure continuous availability of services. Zookeeper can provide such functional services.

Mainly used: znode uniqueness, temporary node ephemerality, monitoring mechanism.

distributed lock

ZooKeeper represents a lock through data nodes. For example, the /itcast/lock node can define a lock. All clients will call the create() interface to try to create a lock child node under /itcast, but the strong consistency of ZooKeeper will ensure that all clients In the end, only one client is created successfully. In other words, it can be considered that the lock has been obtained, and other thread Watchers monitor the changes of the child nodes (waiting for the release of the lock and competing for resources).

In addition, the serialization feature of znode can be used to automatically number the client that creates the znode, so as to realize the function of so-called sequential lock.

Guess you like

Origin blog.csdn.net/Blue92120/article/details/132203849