6.824 distributed system [4] -Zookeeper applied from the master copy

Ready to work

Why read this article

  • Zookeeper is a widely used master copy state machine from service

  • By Chubby (Google's global lock service) inspired

  • Yahoo initially applied, and later in Mesos, HBase widely used

  • Apache open source project

  • Project links

  • Case studies from the master copy

  • API supports a wide range of use cases

  • Superior performance

Development of motivation Zookeeper

  • Services cluster Many applications need to coordinate communications

  • For example, the service requires GFS master list of each block storage server, master decide which one block is the primary and so on.

  • Applications need to find each other

  • MapReduce architecture need to know GFS master of ip and port

  • Superior performance

  • In raft algorithm lab3 3 nodes for comparison needs to be performed once and twice disk write message round trip, to the disk, about 50 msg / sec. For ssd, about 200 msg / sec

  • But Zookeeper can handle about 21,000 msg / sec, from client allows asynchronous invocation mechanism, message processing and pipelining.

Zookeeper's alternatives: for each fault-tolerant master application development services

  • According DNS identifies the ip and port

  • Fault-tolerant processing

  • high performance

Zookeeper design: General Coordination Service

  • Design Challenges

  • API design

  • How to make the master fault tolerance

  • How to get good performance

  • basic design

  • The master state machine

  • The Lord is znodes copy of the object from

  • znode data object by the client process ZooKeeper API, znodes the path name, by a hierarchical name space organization, similar to the unix file system

4.png

znode hierarchical namespace

  • 分层的名称空间是组织数据对象的一种理想方式,因为用户已经习惯了这种抽象,并且可以更好地组织应用程序元数据。

  • znodes包含应用程序的元数据(配置信息、时间戳、版本号)

  • znodes的类型:Regular(客户端通过显式创建和删除常规znode来操作它们),empheral(客户端创建了此类znode,它们要么显式删除它们,要么让系统在创建它们的会话终止时(故意或由于失败)自动将其删除)

  • 为了引用给定的znode,我们使用标准的UNIX符号表示文件系统路径。 例如,我们使用/A/B/C表示 znode C的路径,其中C的父节点为B,B的父节点为A,除empheral节点外,所有节点都可以有子节点

  • znode命名规则: name + 序列号。  如果n是新的znode,p是父znode,则n的序列值永远不会小于在p下创建的任何其他znode名称中的序列值

  • ZooKeeper的数据模型本质上是一个具有简单API且只能读取和写入完整数据的文件系统,或者是具有层次结构键的键/值表。 分层名称空间对于为不同应用程序的名称空间分配子树以及设置对这些子树的访问权限很有用。

  • 会话(session)

  • 客户端连接上zookeeper时会初始化会话

  • 会话允许在故障发生时,客户端请求转移到另一个服务(client知道最后完成操作的术语和索引)

  • 会话有时间限制,client必须持续刷新会话(通过心跳检测)

  • znodes上的操作

  • create(path, data, flags)

  • delete(path, version) if znode.version = version, then delete

  • exists(path, watch)

  • getData(path, watch)

  • setData(path, data, version) if znode.version = version, then update

  • getChildren(path, watch)

  • sync() 除此操作的其他操作是异步的,每个client的所有操作均按FIFO顺序排序。同步会一直等到之前的所有操作都认可为止

  • 顺序保证

  • 所有写操作都是完全有序的

  • ZooKeeper会对所有client发起的写操作做全局统一排序

  • 每一个client的操作都是FIFO顺序的。

  • read操作能够感知到相同客户端的其他写操作

  • read操作能够感知之前的写操作针对相同的znode

Zookeeper用例:ready znode与配置改变

  • ZooKeeper中,新的leader可以将某个path指定为ready znode。 其他节点将仅在该znode存在时使用配置。

  • 当leader 重建配置之后,会通知其他副本重建配置,并新建ready znode.

  • 副本为了防止出现不一致,必须在重建配置时,处理完其之前的所有事务。保证所有服务的状态一致。

  • 任何一个副本更新失败,都不能够应用都需要进行重试。

Zookeeper用例:锁

  • 下面的伪代码向我们锁的实现。通过create试图持有锁,如果锁已经被其他的client持有,则通过watch方式监控锁的释放。

acquire lock:
   retry:
     r = create("app/lock""", empheral)
     if r:
       return
     else:
       getData("app/lock", watch=True)

    watch_event:
       goto retry

  release lock:
    delete("app/lock")
  • 由于上面的伪代码可能会出现羊群效应,可以尝试下面的方式

  • znode下方的children中,序号最低的的是持有锁的

  • 其他在等待的client只watch前一个znode的变化,避免了羊群效应

  acquire lock:
     n = create("app/lock/request-""", empheral|sequential)
   retry:
     requests = getChildren(l, false)
     if n is lowest znode in requests:
       return
     p = "request-%d" % n - 1
     if exists(p, watch = True)
       goto retry

    watch_event:
       goto retry

Zookeeper简化程序构建但其不是最终的解决方案

  • 应用程序还有许多需要解决的问题

  • 例如如果我们要在GFS中使用Zookeeper,那么我们还需要

    • chunks的副本方案

    • primary失败的协议

  • 但是使用了Zookeeper,至少可以使master容错,不会发生网络分区脑裂的问题

Zookeeper实现细节

  • 和lab3相似,具有两层

    • ZooKeeper 服务层  (K/V 层)

    • ZAB 层 (Raft 层)

  • Start() 在底层执行插入操作

  • 随后,ops从每个副本服务器的底层弹出,这些操作按照弹出的顺序提交(commited),在lab3中使用apply channel,在ZAB层中,通过调用abdeliver()

Challenge: Repeat process client requests

  • Scene: the primary client request is received, a failure is returned, the client retry

  • In lab3, we used a map to solve the problem repeated requests, but each client is blocked, can only wait for completion before the next

  • In Zookeeper, the operation period of time is idempotent, to last operation

Challenge: Efficiency read operation

  • Most of the operations are read operations, they do not modify the state

  • Do you have to read operation by ZAB layer?

  • Any copy of the server can perform a read operation?

  • If the read operation is passed Raft / ZAB layer, the performance will be reduced

  • If no read operation by Raft / ZAB layer, may return stale data

Zookeeper Solution: allowed to return outdated data

  • Reading can be performed by any copy

  • Read throughput increases as the number of servers increases

  • Read returns the last zxid it sees

  • But only when sync-read () to ensure data

  • the read operation will return zxid, zxid client portion contains the sequence for this read request corresponding to the write request, so that the client can understand whether the server operating behind client.

  • Detecting heartbeat and returns zxid session establishment, when the client connects to the server to ensure the client and server by comparing a state sufficiently close zxid

to sum up

  • Zookeeper wait-free objects (znode objects) by exposure to a client in a distributed system to solve the problem of coordination process

  • Zookeeper FIFO order to ensure the uniformity and linearity of the operation of the client write operations

  • Zookeeper by allowing the operation returns to read obsolete data throughput value realized hundreds of thousands of times per second operation for reading the multi-scene less

  • Zookeeper still provides a sync operation to ensure read consistency

  • Zookeeper with powerful API functionality for a variety of application scenarios, and provides a master-slave inherent fault tolerance, widely used in a number of companies including Yahoo, including

Reference material


Guess you like

Origin blog.51cto.com/13784902/2471798