Zookeeper: Distributed System Coordination

introduce

Zookeeper is a centralized service that maintains configuration information, service names, distribution synchronization, and cluster services. When we used these services in distributed applications, it was inevitable to spend a lot of time fixing bugs and configuring campaign conditions in actual implementation. Because of their management complexity and fragility, our initial implementation made us overwhelmed. Even if you get it all done, you have to face the complex deployment of different services.

Apache Zookeeper is a sub-project of hadoop, dedicated to an open source framework for highly reliable distributed coordination. The functions provided include: configuration maintenance, name service, distributed synchronization, group service, etc. The goal of ZooKeeper is to encapsulate complex and error-prone key services, and provide users with an easy-to-use interface and a system with high performance and stable functions.

See the Zookeeper Wiki for more information .

Design goals


  • Ease of use  ZooKeeper can allow distributed processes to be organized through a shared coordination-level namespace similar to a standard filesystem. The namespace consists of data registers (znodes), which in Zookeeper are similar to files and file directories. Compared to a typical file system for storing in physical space, Zookeeper data is stored in memory, which means Zookeeper can achieve high throughput and low latency. Zookeeper focuses on high performance, high availability, and strictly ordered access. The high performance of Zookeeper means that it can be used in large-scale distributed systems. With high availability, it can avoid single points of failure in large-scale distributed systems. The characteristics of strict order It enables the client to implement complex synchronization logic.
  • The replica mechanism is  like distributed process cooperation. Zookeeper's service can be replicated by multiple hosts, called ensemble. The service nodes that make up Zookeeper must be able to communicate with each other, and they jointly maintain an in-memory state, as well as transaction logs and snapshots of data persistence. Zookeeper is available as long as most services are available, so it is best to deploy an odd number of nodes.

    The client connects to a node, it sends a request by establishing a TCP connection, obtains a response, gets event notifications and sends heartbeat information. If the connection is lost, the client will connect to another node.

  • Orderly  marks each update with a number to ensure the order of transactions. Subsequent operations can use this ordering to achieve higher requirements, such as synchronization operations.
  • Fast  especially on "read-only" workloads, Zookeeper can run on thousands of machines, and it is more efficient to configure read and write with a 10:1 ratio.

Data Models and Hierarchical Namespaces


ZooKeeper's namespace is very much like a standard filesystem. Each name is a series of paths separated by "/". Each znode is identified with a path.

Nodes and Ephemeral Nodes

Unlike standard file systems, nodes in each namespace can have corresponding data, and can also have subsets. Just like in the file system, it can be a file or a directory. (Zookeeper is designed to coordinate the storage of data: state information, configuration information, location information, etc., so the data stored in each node is usually small, between a few bytes and a few thousand bytes) We use the znode component Makes our description of Zookeeper data nodes clearer and more specific.

The data structure package managed by Znode includes the version number of data modification, ACL modification, and timestamp, in order to ensure the coordination and verification of cache updates. Every time the data of the znode is changed, the version number is incremented. For example: whenever a client receives data, a colleague receives a version of the current data.
Data stored in each znode's namespace can be automatically read and written. Read all data corresponding to the znode itself and write replace all data. Each node has an access control list (ACL) to restrict who can do what.
Zookeeper also has the concept of temporary nodes, temporary nodes are only created when the session (Session) is active, and temporary nodes are deleted when the session ends.

monitor

Zookeeper支持监视器(watch)概念,客户端能够在一个znodes节点配置一个监视器(watch),当一个znode改变监视器(watch)被触发并且被移除。当监视器被触发时,客户端会收到一个描述了 znode 的变更的数据包。如果客户端和Zookeeper服务器之间的连接断开时,客户端将会收到一个本地通知。

保证

Zookeeper 很高效并且很简单,然而,为了构建更复杂的服务,比如同步,它有如下要求:

  • 顺序(Sequential Consistency):致性-来自客户端的更新将按照他们的发送顺序被应用
  • 原子性(Atomicity):要么更新成功,要么更新失败,不会存在部分成功或失败的接口
  • 单一系统镜像(Single System Image):客户端无论连接到那个服务器上,看到的都是相同的视图。
  • 可靠性(Reliability):一次更新将会持续到下一次更新被覆盖
  • 及时性(Timeless):保证客户端在一段时间内看到的系统视图总是最新的。

简单API

Zookeeper 的设计目标之一就是提供简单的编程接口。于是,它只提供了以下的操作:

  • create :在树的指定位置创建一个节点。
  • delete : 删除一个节点。
  • exists : 检测在一个地址上是否存在节点。
  • get data : 从节点读取数据。
  • set data :将数据写入节点。
  • get children :检索子节点列表。
  • sync : 等待数据传播完成。

实现


Zookeeper 组件图展示了 Zookeeper 服务的高层组件。除了“Request Processor”,构成 Zookeeper 服务的所有服务器都会复制一份这些组件的拷贝。
拷贝数据库(Replicated Database)是一个内存型数据库,包含了整个数据树。所有更改都会记录到磁盘中以便回复。数据写入到内存数据库之前,会先序列化到磁盘

性能


Zookeeper 被设计为高性能。但实际是否如此呢?在雅虎研发中心的 Zookeeper 开发团队的研究结果表明的确如此。。在“读”多于“写”的应用程序中尤其地高性能,因为“写”会导致在所有的服务器间同步状态。(“读”多于“写”是协调服务的典型场景。)

可靠性


从这张图中可以得到几点重要的结果。

  • 首先,如果 follower 失效并快速恢复,Zookeeper 能够维持高吞吐量,尽管存在失效。
  • 但更重要的是,leader 选举算法使系统足够快地恢复,避免了吞吐量的总体下降。从观察结果来看,Zookeeper 花了不到 200 毫秒的时间选举出了一个新的 leader。
  • 最后,只要 follower 恢复,Zookeeper 的吞吐量能够再次上升到刚开始处理请求时的水平。

资源

Zookeeper可以在单机运行也可以在很小的集群上运行。

  1. 快速上手:Zookeeper 安装运行向导
  2. 开发文档:
  3. 管理和运维
    • 管理 – 管理和部署Zookeeper向导
    • JMX – JMX集成
    • 监控 – 方便提高Zookeeper的可伸缩性
    • 动态配置-动态重新配置Zookeeper
  4. 下载
  5. 更多

官方网站:http://zookeeper.apache.org/
开源地址:https://github.com/apache/zookeeper


Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324528542&siteId=291194637