The leasing mechanism distributed systems theory of learning

A lease mechanism introduced

In distributed systems, there is often a central server node. The node responsible for storing, maintaining the metadata system. If the various operating systems rely on metadata on the central server, the central server can easily become a performance bottleneck and single point of failure. Through the leasing mechanism can be "power" center servers delegated to other machines, you can reduce the pressure on the central server. Of course, there are many other uses leasing mechanism: for example, determine the state of nodes in the cluster, read-write lock may also be implemented in distributed ......

Below, GFS master issued to lease a chunk server, it becomes Primary copies, when there are multiple concurrent client to update the data block, the data block is determined sequentially updated by the Primary concurrent copy. GFS master delegation of authority to chunk server, relieve some pressure.

 

Of course, the central server may be (metadata server) designed in the form of clusters. In this way, it avoids the central server becomes the bottleneck problem. For example, the message server: RocketMQ. In which both producers and consumers need NameServer to determine the subscription relationship, the cluster will NameServer made stateless, here you do not need a lease mechanism.

 

For the lease, the lease and the receiver have released a lease party. The lease stated that the content can be varied (this is also the reason the lease has various application scenarios). Posted leases generally square central server as mentioned above, the central server to ensure that over the life of the lease commitments contents of the lease provisions remain unchanged.

For example, a distributed caching system metadata server (central server) to each Client release the lease, promised not to change the metadata within the lease period. In this way, each Client just check its lease has not expired, you can read the metadata directly from the local cache, but not always get access to the metadata server metadata.

 

Second, the leasing mechanism analysis

① lease mechanism ensures cache coherency

 After the server sends Lease, it will ensure that does not change the validity of the data in the Lease. In this way, Lease received the Client can safely use the data within the validity period. Within this period, the data and the data on the server Client cache is the same.

Problems:

1)服务器修改元数据时,需要 阻塞所有的读请求,此时服务器不能发出新的Lease。以防止新发出的Lease保证的数据与服务器刚才修改的数据不一致。

解决方法:读请求到来时,直接返回数据,不颁发Lease

 

2)服务器需要等待直至所有的Client的Lease都过期后,再才颁发新“修改”后的Lease。因此,此时服务器上的数据修改了,生成了一个新的Lease版本,需要等到Client上所有的老Lease过期后,该新Lease版本才能颁布给Client。

解决方法:服务器主动通知持久Lease的Client放弃当前的Lease,并请求新Lease

 

另一种缓存一致性的保证方法类似于“租约机制”,当Client请求Server的数据时,Server为Client提供一个“回调承诺”,用于保证当其他Client修改此数据时通知该Client。回调承诺 和 请求的数据都保存在Client端,回调承诺有两种状态:有效和取消。

当Server执行了一个更新数据请求时,它会通知它发送了回调承诺的所有Client,即给Client的端的进程发一个回调(server到client的一个远程过程调用)

,当Client进程收到回调时,它将相关数据的回调承诺标识设置为“取消”。

 

②租约机制能够很好地容纳网络错误异常

 1)Lease颁发过程只依赖于单向的网络通信

服务器颁发Lease后,即使Client没有收到(Client宕机、网络异常),服务器只要等到Lease超时,就可以保证Client不再cache数据,从而可以放心地修改数据而不会破坏cache的一致性。

2)一旦Lease被Client接收,后续Lease机制不再依赖于网络通信。

3)对宕机节点有很好的容错性

颁发Lease的节点宕机了,宕机的颁发者改变不了已经颁发出的Lease的约定,不会影响Lease的正确性。

拥有Lease的节点宕机了,颁发者也不需要做容错处理,只需要等待Lease到期了,就可以收回承诺进行下一步处理。

 

③租约机制确定节点的状态

 在网络中,如何确定某个节点的状态呢?由于网络故障(网络分化)的存在,采用“心跳”机制确定节点的状态会有一些不足。

比如,A、B、C三个节点互为副本,A为primary,Q负责判断A、 B 、C的状态。如果A正常工作,但是A 、Q之间的网络异常,Q也会认为A出现了问题了,于是 Q 重新选择B作为primary,这里会导致“双主”问题。

这里的本质是:Q认为A异常了,但是A自己不认为自己异常。即,由于网络分化造成系统对于“节点状态”认知的不一致。

 解决方法有两个:1)可以使用全体协商确定谁为primary(Paxos算法) ,这是一种去中心化协议

 2)采用Lease机制

 Q收到 A 、B 、C 的heart beat后,给它们颁发一个Lease,表示已经知道了它们的状态,这样 A 、B 、C 可以在有效期内正常工作。同时,Q 可以给 A一个特殊的Lease,表示A可以作为primary工作。当需要切换primary时,只需要等到A的Lease过期,Q给另外节点颁发表示 primary的Lease即可。

 

三,租约机制在 GFS 的写操作中的作用

复制代码

We designed the system to minimize the master’s involvement in all operations.
....
A mutation is an operation that changes the contents or metadata of a chunksuch as a write or an append operation.
 Each mutation is performed at all the chunk’s replicas.
We use leases to maintain a consistent mutation order across replicas. 
The master grants a chunk lease to one of the replicas, which we call the primary. The primary picks a serial
order for all mutations to the chunk. All replicas follow this order when applying mutations.

复制代码

为了减小 master 的负载,master 给某个chunk server 颁发lease,使之成为primary,然后由primary确定 mutation order。
 

The master may sometimes try to revoke a lease before it expires (e.g., when the master wants to disable mutations on a file that is being renamed). 
Even if the master loses communication with a primary, it can safely grant a new lease to another replica after the old lease expires.

这里也采用 “租约机制分析”中讲到的:①master可以在租约还未过期之前 try to revoke a lease。②也可以等到租约过期后,向其他chunk server颁发primary租约(更换primary)。

 

The lease mechanism is designed to minimize management overhead at the master. A lease has an initial timeout of 60 seconds.

这句话表明,租约减少了master的负载,租约的有效期限是60s

写操作的步骤如下:

1)The client asks the master which chunkserver holds the current lease for the chunk and the locations of the other replicas.

Client向master 请求primary地址和其他chunk server地址

2)The master replies with the identity of the primary and the locations of the other (secondary) replicas.

 master返回primary地址及其他chunk server地址。Client可在Lease的有效期内缓存这些信息。

 3)The client pushes the data to all the replicas

Client把待修改的数据发给各个chunk server(replicas),这里实现了控制流与数据流的分离(Client在第二步时获得了控制信息)。

 4)Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary.

Client先把待写入的数据发给各个 chunk server,等到所有的chunk server都收到这些数据后,向 primary 发起写请求。

5)The primary forwards the write request to all secondary replicas.

由primary制定写请求的顺序。所有的replicas都按照这个顺序写数据。看,采用“中心化”的方式制定写请求的顺序,这样很容易保证顺序的唯一性。

6)The secondaries all reply to the primary indicating that they have completed the operation

所有的replicas(secondaries)都按照相同的顺序写入数据后,向primary发送完成报告。

7)The primary replies to the client.

由primary向Client报告此次写操作的结果。

从上面的整个流程看,上面的操作很少涉及到master。大部分由master 颁发给Lease的primary来完成。

其实,个人感觉对于 mutation operation,这里的核心是两个:❶降低master的负载   ❷保证修改顺序的一致性。而通过Lease机制,解决了这两个问题。

master给某台chunk server颁发Lease,使之成为Primary。由Primary接收Client的写请求,由Primary负责其他replicas的写操作是否完成……这都有效地降低了master的负载;其次,由primary制定写操作的序列号,顺序的确定是由primary确定的,不是协商确定的,这种“中心化”的机制容易保证顺序的一致性。

 

四,参考资料

《分布式系统原理介绍》--刘杰

租约机制简介

租约机制简单介绍

 Lease 机制在分布式系统中的应用

 分布式系统概念--第一篇 一致性协议、一致性模型、拜占庭问题、租约、副本协议

 

原文:http://www.cnblogs.com/hapjin/p/5620542.html

Guess you like

Origin blog.csdn.net/hellozhxy/article/details/93130153