zookeeper core principle Detailed curator zookeeper java client's Detailed zookeeper acl authentication mechanism and dubbo, kafka integrated, zooviewer / idea zk plug-in configuration

  Java each client zookeeper mechanisms following practical reference Curator zookeeper client's Detailed java .

  Http://zookeeper.apache.org/doc/current/zookeeperOver.html,http://zookeeper.apache.org/doc/current/zookeeperInternals.html official document describes some of the inner workings on zk, but not enough friendly and detailed.

About zookeeper

  According to the official website, ZooKeeper is used to provide configuration information, naming services, distributed coordination and packet services center services, which are necessary for the distributed applications. From a practical point of view, zookeeper is the most widely used distributed coordination services, including dubbo, kafka, hadoop, es-job and so dependent on the distributed coordination and registration services provided by the zookeeper. Other middleware for providing registration services also include consul and etcd, eureka, but not extensive and zookeeper.

  Official website: https: //zookeeper.apache.org/,https: //zookeeper.apache.org/doc/r3.4.14/

The core mechanism

zookeeper node role

  In zookeeper, nodes divided into the following roles:

  • Leader (leader), responsible for the poll and resolutions, update the system state, Zookeeper cluster, only one node Leader.
  • Learner (learner), including followers (follower) and the observer (observer).
  • follower for accepting a client request and return the results the client wants, to vote in the primary election process, Zookeeper cluster, the follower may be plural.
  • Observer can accept client connections, the write request forwarded to the leader, but the observer does not participate in the voting process, only the synchronous state leader, observer purpose is to expand the system, improve the reading speed (do not participate in the voting reduce the complexity of the election)

In one zookeeper cluster, the interaction between the nodes as follows:

 Note: Almost all modern distributed architecture based on middleware are based on similar practices, such as kafka, es and so on.

 From the above, all requests initiated by the client, or it may be local zkCli java client. Responsibilities of each role in detail below.

Leader

  leader's responsibilities include:

    • Data recovery;
    • Learner maintaining heartbeat, receives the request and determines Learner Learner request message type;
      FIG schematic Leader workflow is as follows, in an actual implementation, the three threads start to implement the functions.

 

Follower

  Follower main responsibilities are:

  • 向Leader发送请求;
  • 接收Leader的消息并进行处理;
  • 接收Zookeeper Client的请求,如果为写清求,转发给Leader进行处理

  Follower workflow schematic shown below, in an actual implementation, Follower function is achieved by five threads.

  The meaning of the message is as follows: 

PING:心跳消息。
PROPOSAL:Leader发起的提案,要求Follower投票。
COMMIT:服务器端最新一次提案的信息。
UPTODATE:表明同步完成。
REVALIDATE:根据Leader的REVALIDATE结果,关闭待revalidate的session还是允许其接受消息。
SYNC:返回SYNC结果到客户端,这个消息最初由客户端发起,用来强制得到最新的更新。

zookeeper data storage mechanism

  Although the zookeeper uses file system storage mechanism, but all data is stored in memory. Its foreign provides a view similar to Unix file system. Znode root node of the tree root path corresponding to the Unix file system.

  

Node Type

  zk in the node called znode (also known as data register, that is, data file storage folder), the length of their life cycle can be divided into persistent node (PERSISTENT) and temporary junction (EPHEMERAL); creating Shihai select whether to add a series of sequence number to distinguish created under the same parent node by a plurality of nodes in the server Zookeeper its path.
  After there are the following four combinations Znode node type:

  • persistent: permanent znode.
  • ephemeral: With the client creates closed automatically deleted, but they are still visible to all clients, ephemeral node does not allow children. It is the core mechanism for implementing distributed coordination.
  • sequential: attached to the above-described two types of nodes, a characteristic. When you create, zookeeper is assigned a serial number on its name. As a globally distributed queue can use. as follows:

         

zookeeper ensure consistency

  zookeeper achieved through the following mechanisms to ensure consistency:

  » 所有更新请求顺序进行,来自同一个client的更新请求按其发送顺序依次执行
  » 数据更新原子性,一次数据更新要么成功,要么失败
  » 全局唯一数据视图,client无论连接到哪个server,数据视图都是一致的,基于所有写请求全部由leader完成,然后同步实现
  » 实时性,在一定事件范围内,client能读到最新数据

Read and write mechanism

» Zookeeper是一个由多个server组成的集群
  » 一个leader,多个follower
  » 每个server保存一份数据副本
  » 全局数据一致
  » 分布式读写
  » 更新请求全部转发由leader完成,并在成功后同步给follower
客户端写请求的过程如下:

  The process is as follows:

  • 1. All transaction requests are referred Leader server clusters to handle, Leader server will convert a transaction request to a Proposal (the proposal), and generates a unique ID for a global incremental, this transaction ID is ID, namely ZXID , Leader server Proposal is their ZXID the order to sort and process.
  • 2. After the server sends Leader Follower Proposal placed into each corresponding queue (Leader will assign a separate queue for each Follower), and sent to a FIFO manner Follower server.
  • After the server receives 3.Follower Proposal transaction, the transaction log is first written to the local disk of the embodiment, and the server returns an ACK response Leader after successful.
  • 4.Leader received more than half as long as the server Follower of ACK response, it will broadcast a message to the Follower Commit to notify submit Proposal, while Leader itself will complete the submission of the Proposal.

  Because each of the requests need to be forwarded to the leader and the voting process, it is not suitable for write-intensive zookeeper scene, such as a sequence generator, distributed lock, different number of nodes, the proportion of different write tps zk the following:

 

   From official test. Based on the above test 3.2,2Ghz Xeon, 2 block SATA 15K RPM hard drive. In a separate log disk, a snapshot of the OS system disk write, read and write respectively 1K size, and the client does not direct leader. And from the above, the more nodes, write slower and read faster. 

zkid

  Znode node status information includes czxid, zxid then what is it? In zk, each time a change of state, corresponds with an increasing Transaction id, the id called zxid. Since the incremental nature zxid if zxid1 less than zxid2 then zxid1 certainly happen before the zxid2 create any node or update data from any node or delete any node, will result in Zookeeper status changes, resulting in an increase of the value of zxid.

Node information znode

  Znode structure is dominated by the data stored therein and status information of two parts, by acquiring a node Znode get command information is as follows:

  The first line is stored in the data ZNode, beginning Znode status information from cZxid. Znode status information more, several major is:

    • czxid:
      That Created ZXID, it represents the creation of the Znode node transaction ID

    • mzxid:
      i.e. Modified ZXID, it represents the last transaction to update the node ID

    • version
      The version number of the Znode node. Znode each node is created when the version number are 0, each updated version will lead to plus 1, the value of Znode stored before and after the update does not change even if the version number will be increased by one. image version value can be understood as the number of nodes to be updated Znode. The version number information Znode status information so that the server can do concurrency control multiple clients on the same Znode update operation. The whole process of java and a bit like the CAS is an optimistic locking concurrency control strategy, and the version value functioned as collision detection. The client version to get the information Znode, and attach the version information, the server must be updated when the client version and compare the actual version Znode when updating Znode, will be modified only when consistent with these two version.

zookeeper's other core mechanism

  • Zookeeper is the core of atomic broadcast, this mechanism ensures synchronization between the various Server. Protocol Implementation of this mechanism is called Zab agreement. Zab protocol has two modes, which are recovery mode (selected from the master) and broadcast mode (synchronous). When the service starts or after leader crashes, Zab entered recovery mode, when the leader is elected, and most finished after Server synchronization and leader of the state, recovery mode is over. State synchronization ensures the leader and Server have the same system state.
  • In order to ensure consistency of the order of affairs, zookeeper using increasing transaction id number (zxid) to identify the transaction. All proposals (proposal) are proposed to be added when zxid. Implementation zxid is a 64-bit number that is high 32 epoch used to identify the relationship between the leader has changed every time a leader is elected, it will have a new epoch, the current logo belonging to the reign of the leader. 32 for the low counts.
  • Each Server has three states in the course of their work:
    LOOKING: Server does not know who the current leader is searching
    LEADING: Current Server is the elected leader
    FOLLOWING: leader has been elected, the current Server to synchronize

Consistency of distributed systems to achieve paxos Algorithms

  paxos algorithm based on this principle:

  • 在一个分布式数据库系统中,如果各节点的初始状态一致,每个节点都执行相同的操作序列,那么他们最后能得到一个一致的状态。
  • Paxos算法解决的什么问题呢,解决的就是保证每个节点执行相同的操作序列。好吧,这还不简单,master维护一个
     全局写队列,所有写操作都必须 放入这个队列编号,那么无论我们写多少个节点,只要写操作是按编号来的,就能保证一
   致性。没错,就是这样,可是如果master挂了呢。
  • Paxos算法通过投票来对写操作进行全局编号,同一时刻,只有一个写操作被批准,同时并发的写操作要去争取选票,
   只有获得过半数选票的写操作才会被 批准(所以永远只会有一个写操作得到批准),其他的写操作竞争失败只好再发起一
   轮投票,就这样,在日复一日年复一年的投票中,所有写操作都被严格编号排 序。编号严格递增,当一个节点接受了一个
   编号为100的写操作,之后又接受到编号为99的写操作(因为网络延迟等很多不可预见原因),它马上能意识到自己 数据
   不一致了,自动停止对外服务并重启同步过程。任何一个节点挂掉都不会影响整个集群的数据一致性(总2n+1台,除非挂掉大于n台)
因此在生产中,要求zookeeper部署3(单机房)或5(单机房)或7(跨机房)个节点的集群。

zookeeper java official client core package Introduction

  • org.apache.zookeeper mainly comprising ZooKeeper client class, and define various ZooKeeper Watch callback interface.
  • org.apache.zookeeper.data defined and associated properties data register
  • org.apache.zookeeper.server, org.apache.zookeeper.server.quorum, org.apache.zookeeper.server.upgrade is the core interface server implementation
  • org.apache.zookeeper.client define the main categories of Four Letter Word

  As the official java client zookeeper end too unfriendly, so the actual use of the three parties in the general client Curator. Therefore, the client does not zookeeper detailed analysis, see the first part of this paper, explain the curator.

watch mechanism

  watch a zookeeper for a one-time observer mechanism nodes (ie, one shot after the failure, the need to manually re-create the watch), similar to database triggers the behavior.

  When the monitored data generating watch, the watch is provided notifies Client, i.e., client watcher. watcher mechanism is to monitor data has some changes, the event type, and there will be a corresponding type of state, the client can monitor a plurality of nodes, reflected in the code generated in a number of several new watcher, as long as the node will survive a change process. Which diagram is as follows:

  In the zookeeper, watch there are two types of events can listen: znode relevant and related client instance. They are as follows:

  • Event Type: (znode node related) [for a node is in terms of what you observed]
    • EventType.NodeCreated [create] node
    • EventType.NodeDataChanged [node data changes]
    • EventType.NodeChildrenChanged [child node of this node change]
    • EventType.NodeDeleted Delete current node]
  • :( [type of state is a state of change between the ZooKeeper cluster with application services] with the client instance related)
    • KeeperState.Disconnected [not connected]
    • KeeperState.SyncConnected [connection]
    • KeeperState.AuthFailed authentication failure []
    • KeeperState.Expired [expired]

  To sum up, zk watch features are:

  • Disposable: For watcher ZooKeeper, you only need to remember one thing, there ZooKeeper watch the event, a one-off triggered when the watch monitored data changes, client notification settings of the watch, that watcher, due to the monitoring of all ZooKeeper is disposable, so each must be set up to monitor
  • Client serial execution: The client watcher callback process is a synchronous serial process, which guarantees us the order, and developers need to pay attention to a little, do not affect the entire client because the processing logic of a watcher of the watcher callback
  • Lightweight: WatchedEvent notification unit is the smallest whole ZooKeeper Watcher notification mechanism, the entire structure contains only three parts: a notification status, event type and node path. That notification Watcher is very simple, just tell the client informed of the occurrence of an event without its specific content, the client needs to carry out its own acquisition, such as NodeDataChanged event, ZooKeeper will inform the client specified node data has changed , but does not directly provide specific data content

Election LEADER server 

  Elections occur Leader node in both cases, the initial cluster building when; secondly, in any event, leader always possible downtime may occur. Zookeeper leader in the electoral process is as follows:

Cluster server will send all other Follower server message, the message can be visualized called ballot, the ballot consists of two main information composed of the elected ID Leader server (that is configured in digital myid file) and the server's transaction ID, transaction indicates the operation to change the state of the server, the server ID of a larger transaction, the more it new data. The entire process is as follows:

  • 1.Follower server to cast ballots (SID, ZXID), for the first time each Follower will elect themselves as Leader server, which means that each Follower first votes are cast out of their server ID and transaction ID.
  • 2. Each Follower will receive votes from other Follower, it will regenerate a ballot based on the following rules: Comparison of votes received and their size ZXID, and select one of the greatest; if you select the SID that is the same as ZXID server ID largest. Each server will eventually regenerate a ballot, and cast out the ballot.

  After so many rounds of voting, if one of the servers got more than half of the votes, it will present selected as Leader. From the above analysis, Zookeeper cluster has a biased selection of Leader server, biased in favor of those more ZXID, ie, data updated machine.

  The whole process is shown below:

 

   So here actually simplifies, details of the process have a final agreement requires further elaboration (subsequent supplements).

Recovery

Zookeeper by the transaction log and data snapshots to avoid the data loss caused by server failure. All storage implementation using transaction mechanism of this point are the same, using WAL + replay mechanism to achieve.

  • The transaction log refers to the first server to the transaction log of operations written to disk before updating memory data, Leader and Follower server records the transaction log.
  • Periodic data snapshot is traversed by the depth of the tree structure data memory mode is transferred to the external memory snapshot. But note that this snapshot is a "fuzzy" because the memory data may have changed when making snapshot. But because the transaction itself Zookeeper operations carried out idempotency guarantee, so after the snapshot data is loaded into memory will be restored to the latest state by way of execution of the transaction log terms.

 

Authority system

  Acl authentication mechanism on the zookeeper, and related integration, refer zookeeper acl authentication mechanism and dubbo, kafka integrated, zooviewer / idea zk plugin configuration .

 

Guess you like

Origin www.cnblogs.com/zhjh256/p/11033043.html