Typical application scenarios Zookeeper (2)

This article comes from knowledge: "distributed consensus principle and practice from Paxos to Zookeeper" Chapter VI

  1. Cluster Management (child node)
  2. Master election (while creating a node)
  3. Distributed Lock (while creating a node)
  4. Distributed Queue (node ​​creation order)

5.1 Cluster Management

With the growing scale distributed systems, machine-scale cluster also will be larger, and therefore, how to better cluster management has become increasingly important.

The so-called cluster management, comprising a cluster monitor and cluster control two blocks, the former focusing the collection of the cluster runtime state, the latter is operated and controlled cluster. In daily operation and maintenance and the development process, we often have
similar to the following requirements.

  • We want to know how much of the current cluster machine at work.
  • The operation of each machine in the cluster state data collection.
  • Of upper and lower cluster machine operation.

In the tradition of Agent-based distributed cluster management systems, are deployed by a Agent on each machine in the cluster, the Agent responsible for this initiative to a central monitoring system (designated control center
is responsible for all data centralized system processing, forming a series of reports, and is responsible for real-time alerts, hereinafter referred to as the "monitoring Center") report states where their machines. In the moderate scenario cluster size, it really
is a widely used solution in the production practice, to quickly and efficiently implement a distributed cluster environment monitoring, but once the system is operational scenarios increased cluster size increases, the solution drawbacks also emerged:

  • A massive upgrade of difficulty: the presence of the client in the form of the Agent, after large-scale use, as soon as circumstances require large-scale upgrade of the encounter, it is very troublesome, face enormous challenges in control upgrade costs and upgrade schedule.
  • Unified Agent is unable to meet the diverse needs: for machine CPU usage, load (Load), memory usage, network throughput and disk capacity and other basic physical state of the machine, using a unified Agent
    to be monitored may have to meet. However, if you need in-depth internal applications, monitor the status of some business, for example, in a distributed messaging middleware, it is desirable to monitor the status of each message consumer spending;
    or in a distributed task scheduling system, the need monitor the implementation of the tasks on each machine. Obviously, for these businesses coupled closely monitoring requirements, not suitable for the Agent to provide a uniform.
  • Programming Language diversity: With the advent of more and more programming languages, endless variety of heterogeneous systems. If you are using Agent traditional way, you need to provide a variety of languages Agent client. On the other hand,
    "Monitoring Center" is facing enormous challenges in the heterogeneous data integration system.

ZooKeeper has the following two characteristics:

  • If the client registering Watcher listens to a data node ZooKeeper, then when the content of the data node or its child node list changed, ZooKeeper server will subscribe to
    send clients change notification.
  • Temporary node created on ZooKeeper, once the session between the client and the server fails, then the temporary node will be cleared automatically.

ZooKeeper use of these two characteristics, the system can be achieved another cluster machine liveness monitoring. For example, the monitoring system /clusterServersregistered on a node Watcher monitor,
then whenever the machine is operated dynamically add, will be in /clusterServerscreating a temporary node under node /clusterServers/[Hostname]. As a real-time monitoring system can
detect changes in the situation of the machine, as follow-up treatment is to monitor the system's operations. Here we take a look through the distributed log collection management systems and on-line cloud hosting these two typical examples of how to use the ZooKeeper
implement cluster management.

5.1.1 distributed log collection system

The core work of distributed log collection system is distributed across different machines to collect system logs, look at where we focus on distributed log collection module system.

In a typical architecture of the system log, the log of all system will log the entire machine to be collected (hereinafter the "Log Source Machine" for such machines) into a plurality of groups, each corresponding to a collector ,
the collector is actually a background machine (hereinafter the "collector machine" for such machines) for collecting logs. For large-scale distributed log collection system scene, usually we need to solve the following two questions.

  • Log of changes in the source machine: In a production environment, along with changes in the machine, each application in almost all machine changes (machine hardware problems, expansion, or network room migration issues will lead to a change machine applications) every day ,
    that is to say each group of log source machine usually is in constant change.
  • Changes Collector machines: log collection system itself will have to change or expansion of the machine, so if a new collector added or old collector machine exit will appear.

The above two issues, whether it is the source log collector machine or machine changes, ultimately boils down to one thing: how to quickly and reasonably, dynamically allocating a log source machine corresponding to each collector, which has become the entire
logging system properly the premise of stable operation, the log collection process is the biggest technical challenges. In this case, the introduction of ZooKeeper is a good choice, let's look at this ZooKeeper
use scene.

.1 registered collector machine

ZooKeeper used to register the log collector system, a typical approach is to create a node as the root node on ZooKeeper collector, for example, /logs/collector(hereinafter we "collector
node" represents the data node), each collector machine start that they will create their own node in the collector node, for example logs/collector/[Hostname].

.2 task distribution

After all collectors machines are created corresponding to the own node, the system node in accordance with the number of the lower sub-collector node, all log source are divided into groups corresponding to the machine, the machine then the packet are written to the list after
the collector machines child node created (for example /logs/collector/host1) go up. Thus, each collector machine capable of acquiring its own list from the log source machine corresponding collector node,
and further the log collection is started.

.3 Status Reporting

Upon completion of registration and collectors machine task distribution, we have to take into account the possible hang of these machines at any time. Therefore, for this problem, we need to have a mechanism for reporting status collector:
each collector machine After creating their own exclusive nodes, also you need to create a child node status on the corresponding child node, for example /logs/collector/host1/status, each collection It is need to regularly
write their own status information to the node. We can put this strategy seen as a detection mechanism, usually a collector machine will be written to the log collection progress information in this node. The logging system time of the last update of the status of the child node
to determine whether the corresponding collector survival machine.

.4 dynamic allocation

If the collector machine hang up or expansion, we need to dynamically allocate collection tasks. During operation, the system always paid attention to the log /logs/collectorchange all child nodes of this node,
upon detection of a collector machine has stopped reporting or have a new collector machine to join, will begin to re-assign the task. Whether it is stopped for a collector reported a machine or a new machine added,
log system needs to be assigned before the transfer collector of all tasks. To solve this problem, there are usually two approaches.

.4.1 global dynamic allocation

This is a simple and crude approach, hang in the event of a collector or a new machine when the machine is added, according to the log system requires new machines list of collectors, immediately conduct a re-grouping of all log source machine,
then assigned to the rest of the collector machine.

.4.2 local dynamic allocation

Although the global dynamic allocation strategy is simple, but there is a problem: Change the collector of a machine or part, will lead the global dynamic allocation of tasks, the impact was relatively large, so the risk is also greater.
The so-called local dynamic allocation, by definition is the dynamic allocation of tasks within a small range. In this strategy, each collector machine at the same time to report their own log collection status, it will report its own load up.
Please note that the load mentioned here is not just simply a machine CPU load (Load), but a comprehensive assessment of the current collector for the task.

In this strategy, if a collector machine hung up, then the system will log the task assigned to the machine before re-allocated to those lower load the machine up. Similarly, if you have added a new collector machine,
will be transferred to this part of the task of adding new machines from those high-load machines.

.5 Notes

.5.1 node type

First look at /logs/collectorthe child nodes of node type node. All this node below a child node represents each collector machine, then the preliminary view that these child nodes must choose a temporary node,
because the logging system can determine the viability of collector machines based on these temporary nodes. However, it also needs to pay attention to is this: the distributed log collection in this scenario, will hold all the collector node
machine list of log sources has been assigned to the collector of the machine, if simply rely on their own temporary ZooKeeper node mechanisms, then hang up when a collector or when the collector machine break "heartbeat report"
after the time of the session to be the collector node failure, ZooKeeper will immediately remove the node, then recorded on the node All machines along with a list of log sources will be cleared out.

From the above description may know, the temporary node clearly unable to meet this business demand, so we chose each collector machine uses persistent node identifier, creating respectively in the persistent node below
/logs/collector/[Hostname]/statusnode to characterize each collector machine status. As a result, both to achieve monitoring of all system log collector, the collector machine hang while
later, still able to accurately assign the task to which the reduction.

.5.2 log system node monitor

In the actual production run, each collector machine to change the frequency of its own state of the node may be very high (such as once per second or less), and the number of collectors can be very large, if the logging system monitor all
these nodes changes, then the amount of message notification can be very large. On the other hand, in the case of the collector machine working properly, log system does not need to receive each time a node status change in real time, so most
informed of these changes are useless. Therefore, we consider giving up the monitor settings, instead of using the policy initiative polling collector node of the logging system, thus saving a lot of LAN traffic, the only flaw is that there is
a certain delay (taking into account the positioning of distributed log collection system. this delay is acceptable).

5.1.2 Online cloud hosting management

Online cloud hosting management scenario usually occurs in those web hosting provider. In this type of cluster management, there is a very important piece of a cluster machine monitoring. This scene usually for a state machine in the cluster,
especially statistical machine online rates have higher requirements, and the need to quickly respond to changes in the cluster machine.

In a traditional implementation, monitoring systems to detect the timing of each machine by some means (such as detecting host specified port), or each machine its own regular reports to the monitoring system, "I'm alive."
However, this approach requires each developer their own business systems to handle many trivial problems of network communication, protocol design, scheduling and disaster recovery and so on. Let's look at using another cluster machine ZooKeeper achieve
viability monitoring system. For this system, we need points are generally as follows.

  • How quickly the current production environment statistics a total of how many machines?
  • How to quickly get to the machine case / offline?
  • How to run real-time monitoring each host in the cluster state?

.1 machine on / offline

In order to automate the operation and maintenance of the line, we must on the machine / down the line there is a global monitoring. Usually when the new machine, you need to specify the Agent deployed to these machines go.
After Agent deployment starts, it will first be registered to the specified node ZooKeeper, specific approach is to create a temporary child node in the node machine list below, for example, /XAE/machine/[Hostname]
(hereinafter the "master node" for this node), as shown below:


When the Agent has been created this temporary child node on ZooKeeper, to /XAE/machinesthe monitoring center node attention will be receiving the "child node change" event, that is, on-line notice, so he can be on this
open the corresponding background management logic newly added machines. On the other hand, the monitoring center can also get to notice the machine off the assembly line, this will achieve the detection of machine onto / off the assembly line, while being able to easily get
to the list of machines online, for large-scale expansion and capacity evaluations a great help.

 

.2 machine monitoring

For an online cloud host system, not only to detect the state of the machine online, you also need to run the machine state monitoring. In the course of operation performed, Agent timing operation state information of the host
write host node on ZooKeeper, the monitoring center host running time to obtain information indirectly by subscribing change notification data these nodes.

As distributed systems become more and more large-scale, cluster machine monitoring and management become increasingly important. In this way implemented by ZooKeeper mentioned above, only the cluster can be detected in real time
on / off the assembly line in the case of the machine, but can be acquired during operation of the host information in real time, it is possible to construct a large clusters host profiles.

6.1 Master election

Master election is a very common scenario in a distributed system. The system unit is a distributed core characteristics independent computing power can be deployed on different machines, constituting a complete
distributed system. At the same time, the actual scene often need to select a so-called "boss" in these distributed across different machines stand-alone system unit, in computer science, which we call "Master".

In distributed systems, Master often used to coordinate other cluster systems unit, has the right to decide on a distributed system status changes. For example, in some applications the separate read and write scenario, the client is usually a write request
is handled by the Master; in other scenarios, the Master often responsible for some complex logic, and the processing result to the others in the cluster synchronization system unit. Master election can be said ZooKeeper
most typical application scenarios, and in this section, we combine this specific case to "a massive data processing and sharing model" to look at scenarios in ZooKeeper cluster Master election.

In a distributed environment, often faced with this scenario: all systems in the cluster unit to provide data required for front-end services, such as a product ID, or a website carousel ad ID (usually
appeared in some ads system) and the like, which are commodity or advertising ID obtained from the ID number is often necessary to calculate mass data processing ---- this is often a very time-consuming / O process and I CPU resources.
Given the complexity of the calculation process, if all machines in the cluster to perform this calculation logic, it will cost a lot of resources. A better approach is to only allow some of the cluster, or even just
let them go to a machine for processing data computing, data once calculated results can be shared to all other client machines across the cluster, which can greatly reduce duplication of effort, improve performance.

Here we put the system background scene with a simple example to explain this advertising model. The entire system can generally be divided into client clusters, distributed cache systems, massive data processing bus and ZooKeeper
four parts, as shown below:


Client Cluster Timing will be achieved through ZooKeeper Master election day. After the client elected Master, this Master will be responsible for a series of massive data processing, and finally calculate
a result, the data and place it in a memory / database. Meanwhile, Master also need to notify others in the cluster all clients shared results from this memory / database.

 

Next, we will look at the focus of the election process Master, first of all to clarify the requirements under the Master election: the election of a machine as the Master of all machines in the cluster. To address this demand, usually
the next, we can select the key characteristics common relational database to achieve: all the machines in the cluster inserting records a same primary key ID to the database, the database will help us automate the primary key conflict
check, That is, all customer insertion end machine, only one machine to be successful ---- then we believe that the customer database to successfully insert data terminal machine becomes Master.

At first glance, this program does work, relying on the primary key characteristic relational database can well guarantee in the cluster elected only one Master. But another issue we need to consider is that if the current
elected Master hung up, then how to deal with? Master who told me to hang out? Clearly, relational databases can not inform us of this event.

ZooKeeper的强一致性,能够很好地保证在分布式高并发情况下节点的创建一定能够保证全局唯一性,即ZooKeeper将会保证客户端无法重复创建一个已经存在
的数据节点。也就是说,如果同时有多个客户端请求创建同一个节点,那么最终一定只有一个客户端请求能够创建成功。利用这个特性,就能很容易地在分布式
环境中进行Master选举了。

在这个系统中,首先会在ZooKeeper上创建一个日期节点,如下图:


客户端集群每天都会定时往ZooKeeper上创建一个临时节点,例如/master_election/2017-09-03/binding。在这个过程中,只有一个客户端能够成功
创建这个节点,那么这个客户端所在机器就称为了Master。同时,其他没有在ZooKeeper上成功创建节点的客户端,都会在节点/master_ecection/2017-09-03
上注册一个子节点变更的Watcher,用于监控当前的Master机器是否存活,一旦发现当前的Master挂了,那么其余的客户端将会重新进行Master选举。

 

从上面的讲解中,我们可以看到,如果仅仅只是想实现Master选举的话,那么其实只需要有一个能够保证唯一性的组件即可,例如关系型数据库的主键模型
就是不错的选择。但是,如果希望能够快速地进行集群Master动态选举,那么基于ZooKeeper来实现是一个不错的新思路。

7.1 分布式锁

分布式锁是控制分布式系统之间同步访问共享资源的一种方式。如果不同的系统或是同一个系统的不同主机之间共享了一个或一组资源,那么访问这些资源的
时候,往往需要通过一些互斥手段来防止彼此之间的干扰,以保证一致性,在这种情况下,就需要使用分布式锁了。

在平时的实际项目开发中,我们往往很少会去在意分布式锁,而是依赖于关系型数据库固有的排他性来实现不同进程之间的互斥。这确实是一种非常简便且被
广泛使用的分布式锁实现方式。然而有一个不争的事实是,目前绝大多数大型分布式系统的性能瓶颈都集中在数据库操作上。因此,如果上层业务再给数据库
添加一些额外的锁,例如行锁、表锁甚至是繁重的事务处理,那么是不是会让数据库更加不堪重负呢?下面我们来看看使用ZooKeeper如何实现分布式锁,
这里主要讲解排他锁和共享锁两类分布式锁。

7.1.1 排他锁

排他锁(Exclusive Locks,简称X锁),又称为写锁或独占锁,是一种基本的锁类型。如果事务T1对数据对象O1加上了排他锁,那么在整个加锁期间,只允许
事务T1对O1进行读取和更新操作,其他任何事务都不能再对这个数据对象进行任何类型的操作————直到T1释放了排他锁。

从上面讲解的排他锁的基本概念中,我们可以看到,排他锁的核心是如何保证当前有且仅有一个事务获得锁,并且锁被释放后,所有正在等待获取锁的事务都
能够被通知到。下面我们就看看如何借助ZooKeeper实现排他锁。

.1 定义锁

有两种常见的方式可以用来定义锁,分别是synchronized机制和JDK5提供的ReentrantLock。然而,在ZooKeeper中,没有类似于这样的API可以直接使用,
而是通过ZooKeeper上的数据节点来表示一个锁,例如/exclusive_lock/lock节点就可以被定义为一个锁,如下图:

 

.2 获取锁

在需要获取排他锁时,所有的客户端都会试图通过调用create()接口,在/exclusive_lock节点下创建临时子节点/exclusive_lock/lock。而ZooKeeper
会保证在所有的客户端中,最终只有一个客户端能够创建成功,那么就可以认为该客户端获取了锁。同时,所有没有获取到锁的客户端就需要到/exclusive_lock
节点上注册一个子节点变更的Watcher监听,以便实时监听到lock节点的变更情况。

.3 释放锁

由于是临时节点,有下面两种情况,可能释放锁:

  • 当前获取锁的客户端机器发生宕机
  • 正常执行完业务逻辑后,客户端主动将临时节点删除。

无论在上面情况下移除了lock节点,ZooKeeper都会通知所有在/exclusive_lock节点上注册了子节点变更Watcher监听的客户端。这些客户端在接收到通知后,
再次重新发起分布式锁获取,即重复“获取锁”过程。如下图:

 

7.1.2 共享锁

共享锁(Shared Locks,简称S锁),又称读锁,同样是一种基本的锁类型。如果事务T1对数据对象O1加上了共享锁,那么当前事务只能对O1进行读取操作,
其他事务也只能对这个数据对象加共享锁————直到该数据对象上的所有共享锁都被释放。

共享锁和排他锁最根本的区别在于,加上排他锁后,数据对象只对一个事务可见,而加上共享锁后,数据对所有事务都可见。

.1 定义锁

和排他锁一样,同样是通过ZooKeeper上的数据节点来表示一个锁,是一个类似于/shared_lock/[Hostname]-请求类型-序号的临时顺序节点,例如
/shared_lock/192.168.0.1-R-0000000001,那么,这个节点就代表了一个共享锁,如下图:

 

.2 获取锁

在需要获取共享锁时,所有客户端都会到/shared_lock这个节点下面创建一个临时顺序节点,如果当前是读请求,那么就创建例如/shared_lock/192.168.0.1-R-000000001/
的节点;如果是写请求,那么就创建例如/shared_lock/192.168.0.1-W-000000001的节点。

.3 判断读写顺序

根据共享锁的定义,不同的事务都可以同时对同一数据对象进行读取操作,而更新操作必须在当前没有任何事务进行读写操作的情况下进行。基于这个原则,
我们来看看如何通过ZooKeeper的节点来确定分布式读写顺序,大致可以分为如下4个步骤。

  1. 创建完节点后,获取/shared_lock节点下的所有子节点,并对该节点注册子节点变更的Watcher监听。
  2. 确定自己的节点序号在所有子节点中的顺序。
  3. 如果当前节点业务为读请求:如果没有比自己序号小的子节点,或是所有比自己序号小的子节点都是读请求,那么表明自己已经成功获取到了共享锁,同时
    开始执行读取逻辑。如果比自己序号小的子节点有写请求,那么就需要进入等待。
    如果当前节点业务为写请求:如果自己不是序号最小的子节点, 那么就需要进入等待。
  4. 接收到Watcher通知后,重复步骤1。

.4 释放锁

释放锁的逻辑和排他锁是一致的。

.5 羊群效应

上面讲解的这个共享锁实现,大体上能够满足一般的分布式集群竞争锁的需求,并且性能都还可以————这里说的一般场景是指集群规模不是特别大,一般是在
10台机器以内。但是如果机器规模扩大之后,会有什么问题呢?我们着重来看上面“判断读写顺序”过程的步骤3,如下图,看看实际运行中的情况。

 

  1. 192.168.0.1这台机器首先进行读操作,完成读操作后将节点/192.168.0.1-R-000000001删除。
  2. 余下的4台机器均收到了这个节点被移除的通知,然后重新从/shared_lock/节点上获取一份新的子节点列表。
  3. 每个机器判断自己的读写顺序。其中192.168.0.2这台机器检测到自己已经是序号最小的机器了,于是开始进行写操作,而余下的其他机器发现没有轮到
    自己进行读取或更新操作,于是继续等待。
  4. 继续......

上面这个过程就是共享锁在实际运行中最主要的步骤了,我们着重看下上面步骤3中提到的:“而余下的其他机器发现没有轮到自己进行读取或更新操作,于是继续等待。”
很明显,我们看到,192.168.0.1这个客户端在移除自己的共享锁后,ZooKeeper发送了子节点变更Watcher通知给所有机器,然而这个通知除了给192.168.0.2
这台机器产生实际影响外,对于余下的其他所有机器都没有任何作用。

相信读者也已经意思到了,在这整个分布式锁的竞争过程中,大量的“Watcher通知”和“子节点列表获取”两个操作重复运行,并且绝大多数的运行结果都是
判断出自己并非是序号最小的节点,从而继续等待下一次通知————这个看起来显然不怎么科学。客户端无端地接收到过多和自己并不相关的事件通知,如果在集群
规模比较大的情况下,不仅会对ZooKeeper服务器造成巨大的性能影响和网络冲击,更为严重的是,如果同一时间有多个节点对应的客户端完成事务或是事务
中断引起节点消息,ZooKeeper服务器就会在短时间内向其余客户端发送大量的事件通知————这就是所谓的羊群效应。

上面这个ZooKeeper分布式共享锁实现中出现羊群效应的根源在于,没有找准客户端真正的关注点。我们再来回顾一下上面的分布式锁竞争过程,它和核心
逻辑在于:判断自己是否是所有子节点中序号最小的。于是,很容易可以联想到,每个节点对应的客户端只需要关注比自己序号小的那个相关节点的变更情况
就可以了————而不需要关注全局的子列表变更情况。

.6 改进后的分布式锁实现

现在我们来看看如何改进上面的分布式锁实现。首先,我们需要肯定的一点是,上面提到的共享锁实现,从整体思路上来说完全正确。这里主要的改动在于:
每个锁竞争者,只需要关注/shared_lock/节点下序号比自己小的那个节点是否存在即可,具体实现如下:

  1. 客户端调用create()方法创建一个类似于/shared_lock/[Hostname]-请求类型-序号的临时顺序节点。
  2. 客户端调用getChildren()接口来获取所有已经创建的子节点列表,注意,这里不注册任何Watcher。
  3. 如果无法获取共享锁,那么就调用exist()来对比自己小的那个节点注册Watcher。注意,这里“比自己小的节点”只是一个笼统的说法,具体对于读请求和写请求不一样。
    读请求:向比自己序号小的最后一个写请求节点注册Watcher监听。
    写请求:向比自己序号小的最后一个节点注册Watcher监听。
  4. 等待Watcher通知,继续进入步骤2。

流程图如下:

 

.7 注意

看到这里,相信很多读者都会觉得改进后的分布式锁实现相对来说比较麻烦。确实如此,如同在多线程并发编程实践中,我们会去尽量缩小锁的范围————对于
分布式锁实现的改进其实也是同样的思路。那么对于开发人员来说,是否必须按照改进后的思路来设计实现自己的分布式锁呢?答案是否定的。在具体的实际开发
过程中,我们提倡根据具体的业务场景和集群规模来选择适合自己的分布式锁实现:在集群规模不大、网络资源丰富的情况下,第一种分布式锁实现方式是
简单实用的选择;而如果集群规模达到一定程度,并且希望能够精细化地控制分布式锁机制,那么不妨试试改进版的分布式锁实现。

8.1 分布式队列

业界有不少分布式队列产品,不过绝大多数都是类似于ActiveMQ、Kafka等的消息中间件。在本节中,我们主要介绍基于ZooKeeper实现的分布式队列。
分布式队列,简单地讲分为两大类,一种是常规的先入先出队列,另一种则是要等到队列元素集聚之后才统一安排执行的Barrier模型。

8.1.1 FIFO:先进先出

使用ZooKeeper实现FIFO队列,和共享锁的实现非常类似。FIFO队列就类似于一个全写的共享锁模型,大体的设计思想其实非常简单:所有客户端都会到
/queue_fifo这个节点下面创建一个临时顺序节点,例如/queue_fifo/192.168.0.1-0000000001,如下图:


创建完节点之后,根据如下4个步骤来确定执行顺序。

 

  1. 通过调用getChildren()接口来获取/queue_fifo节点下的所有子节点,即获取队列中所有的元素。
  2. 确定自己的节点序号在所有子节点中的顺序。
  3. 如果自己不是序号最小的子节点,那么就需要进入等待,同时向比自己序号小的最后一个节点注册Watcher监听。
  4. 接收到Watcher通知到,重复步骤1。

整个FIFO队列的工作流程,如下图:

 

8.1.2 Barrier:分布式屏障

Barrier原意是指障碍物、屏障,而在分布式系统中,特指系统之间的一个协调条件,规定了一个队列的元素必须都集聚后才能统一进行安排,否则一直等待。
这往往出现在那些大规模分布式并行计算的应用场景了:最终的合并计算需要基于很多并行计算的子结果来进行。这些队列其实是FIFO队列的基础上进行了
增强,大致的设计思想如下:开始时,/queue_barrier节点是一个已经存在的默认节点,并且将其节点的数据内容赋值为一个数字n来代表Barrier值,
例如n=10表示只有当/queue_barrier节点下的子节点个数达到10后,才会打开Barrier。之后,所有的客户端都会到/queue_barrier节点下创建一个
临时节点,例如/queue_barrier/192.168.0.1,如下图:


创建完节点之后,根据如下5个步骤来确定执行顺序。

 

  1. 通过调用getDate()接口获取/queue_barrier节点的数据内容:10。
  2. 通过调用getChildren()接口获取/queue_barrier节点下的所有子节点,即获取队列中所有元素,同时注册对子节点列表变更的Watcher监听。
  3. 统计子节点的个数。
  4. 如果子节点个数还不足10个,那么就需要进入等待。
  5. 接收到Watcher通知后,重复步骤2。


    博主理解为,如果在很少的时间内,同时超过了10个以上的业务机创建了临时节点,那么业务处理的速度并不是恒定的,因为有可能这个业务被11个机器处理,
    下一个被12个业务机处理?



作者:李文文丶
链接:https://www.jianshu.com/p/bd01abf2eaae
来源:简书
简书著作权归作者所有,任何形式的转载都请联系作者获得授权并注明出处。

Guess you like

Origin blog.csdn.net/demon7552003/article/details/92055789