paxos的实现源码分析(以keyspace的paxoslease的实现为例子)

      看过太多的paxos的算法的介绍,paxos个人认为没有那么难,但是为啥那么难懂呢?因为大家都是根据理论讨论,作为一个程序员,源码下无秘密,因此我结合keyspace的paxos lease的源码实现来分析一下paxos的算法(paxos的直接实现会存在活锁问题,因此大多数的实现都是通过一个paxos的lease算法选择一个主proposer,可以看成一轮paxos的实现)。

     1.  阶段一: prepare-》promise

      proposer选择一个提案编号proposalID,然后向acceptor的集合中的全部成员发送编号为proposalID的Prepare请求。

    

void PLeaseProposer::StartPreparing()
{
	Log_Trace();

        // proposer启动定时器Timer1,等待T秒便超时
	EventLoop::Reset(&acquireLeaseTimeout);

	state.proposing = false;

	state.preparing = true;
	
	state.leaseOwner = RCONF->GetNodeID();
	
	state.highestReceivedProposalID = 0;

	state.proposalID = RCONF->NextHighest(highestProposalID);
		
	if (state.proposalID > highestProposalID)
		highestProposalID = state.proposalID;
	
	msg.PrepareRequest(RCONF->GetNodeID(),
	state.proposalID, RLOG->GetPaxosID());
	
        // proposer启动定时器Timer1,等待T秒便超时
	BroadcastMessage();
}

        acceptor接收到一个编号为proposalID的Prepare请求,且编号proposalID大于该acceptor已经响应的所有Prepare请求的编号,那么它就会将它已经批准过的最大编号的提案作为响应反馈给Proposer,同时该Acceptor会承诺不会再批准任何编号小于proposalID的提案。

    

void PLeaseAcceptor::OnPrepareRequest()
{
	Log_Trace("msg.paxosID: %" PRIu64 ", my.paxosID: %" PRIu64 "",
	msg.paxosID, RLOG->GetPaxosID());

	if (msg.paxosID < RLOG->GetPaxosID() && (int) msg.nodeID != RLOG->GetMaster())
		return; // only up-to-date nodes can become masters

	RLOG->OnPaxosLeaseMsg(msg.paxosID, msg.nodeID);
	
	unsigned senderID = msg.nodeID;
	
	if (state.accepted && state.acceptedExpireTime < Now())
	{
		EventLoop::Remove(&leaseTimeout);
		OnLeaseTimeout();
	}
	
        // 失败: 如果msg.ballot_number < local.promisedBallotNumber,则发送拒绝消息
	if (msg.proposalID < state.promisedProposalID)
		msg.PrepareRejected(RCONF->GetNodeID(), msg.proposalID);
	else
	{
            // 成功: 否则,发送accept,包含的内容已经promise的最大编号和T
		state.promisedProposalID = msg.proposalID;

		if (!state.accepted)
			msg.PrepareCurrentlyOpen(RCONF->GetNodeID(),
			msg.proposalID);
		else // 返回已经被accept的结果
			msg.PreparePreviouslyAccepted(RCONF->GetNodeID(),
			msg.proposalID, state.acceptedProposalID,
			state.acceptedLeaseOwner, state.acceptedDuration);
	}
	
	SendReply(senderID);
}

 

    接下来proposer根据acceptor的响应,决定是否开始阶段二

 

// 阶段1的Proposer请求后的相应结果处理: prepare的处理
void PLeaseProposer::OnPrepareResponse()
{
	Log_Trace();

	if (!state.preparing || msg.proposalID != state.proposalID)
		return;
		
	numReceived++;
	
	if (msg.type == PLEASE_PREPARE_REJECTED) // (1)被拒绝
		numRejected++;
	else if (msg.type == PLEASE_PREPARE_PREVIOUSLY_ACCEPTED && 
			 msg.acceptedProposalID >= state.highestReceivedProposalID)
	{ // (2) 已经有结果了
		state.highestReceivedProposalID = msg.acceptedProposalID;
		state.leaseOwner = msg.leaseOwner;
	}

        //失败: prepare的结果被拒绝, 否则,随机等待一段时间,提高编号重启prepare过程
	if (numRejected >= ceil((double)(RCONF->GetNumNodes()) / 2))
	{
		StartPreparing();
		return;
	}
	
        // 成功: 如果多数派accept,则进入promise阶段
	// see if we have enough positive replies to advance	
	if ((numReceived - numRejected) >= RCONF->MinMajority())
		StartProposing();	
}

 

    2. 阶段二: propose-》accept

   如果Propser收到来自半数以上的Acceptor对于其发出的编号为proposalID的Prepare请求的响应,那么它就会发送一个针对[proposalID, V]提案的Accept请求给Acceptor。 注意:这里V的值就是收到的响应中的提案的值(后者认同前者原则,如果已经生成提案,会保持与前面一致性),如果还没有任何提案,那么它就是任意值(注意,在keyspace就是proposer对应的节点)。

    

void PLeaseProposer::StartProposing()
{
	Log_Trace();
	
	state.preparing = false;
        
        // 如果prepare阶段接收的value不为空,则终止promise
	if (state.leaseOwner != RCONF->GetNodeID())
		return; // no point in getting someone else a lease,
				// wait for OnAcquireLeaseTimeout

        // 否则,发送(ballot number,proposer id ,T)
	state.proposing = true;

	state.duration = MAX_LEASE_TIME;
	state.expireTime = Now() + state.duration;
	msg.ProposeRequest(RCONF->GetNodeID(), state.proposalID,
		state.leaseOwner, state.duration);

	BroadcastMessage();
}

 

     acceptor接收到针对[proposalID,V]的Accept请求,只要该Acceptor尚未对编号大于proposalID的Prepare请求做出响应,它就可以通过这个提案。

   

void PLeaseAcceptor::OnProposeRequest()
{
	Log_Trace();
	
	unsigned senderID = msg.nodeID;
	
	if (state.accepted && state.acceptedExpireTime < Now())
	{
		EventLoop::Remove(&leaseTimeout);
		OnLeaseTimeout();
	}
	
        // 失败: 提交结果时,仍然检查编号
	if (msg.proposalID < state.promisedProposalID)
		msg.ProposeRejected(RCONF->GetNodeID(), msg.proposalID);
	else
	{   // 成功
		state.accepted = true;
		state.acceptedProposalID = msg.proposalID;
		state.acceptedLeaseOwner = msg.leaseOwner;
		state.acceptedDuration = msg.duration;
		state.acceptedExpireTime = Now() + state.acceptedDuration;
		
		leaseTimeout.Set(state.acceptedExpireTime);
		EventLoop::Reset(&leaseTimeout);
		
		msg.ProposeAccepted(RCONF->GetNodeID(), msg.proposalID);
	}
	
	SendReply(senderID);
}

   

     当proposer接收到半数以上的accept通过响应后,则提案最终通过。

  

void PLeaseProposer::OnProposeResponse()
{
	Log_Trace();

	if (state.expireTime < Now())
	{
		Log_Trace("already expired, wait for timer");
		return; // already expired, wait for timer
	}
	
	if (!state.proposing || msg.proposalID != state.proposalID)
	{
		Log_Trace("not my proposal");
		return;
	}
	
	numReceived++;
	
	if (msg.type == PLEASE_PROPOSE_ACCEPTED)
		numAccepted++;
	
	Log_Trace("numAccepted: %d", numAccepted);

	// 成功: see if we have enough positive replies to advance
	if (numAccepted >= RCONF->MinMajority() &&
	state.expireTime - Now() > 500 /*msec*/)
	{ // 被多数派接收
		// a majority have accepted our proposal, we have consensus
		state.proposing = false;
		msg.LearnChosen(RCONF->GetNodeID(),
		state.leaseOwner, state.expireTime - Now(), state.expireTime);
		
		EventLoop::Remove(&acquireLeaseTimeout);
		
		extendLeaseTimeout.Set(Now() + (state.expireTime - Now()) / 7);
		EventLoop::Reset(&extendLeaseTimeout);
	
		BroadcastMessage();
		return;
	}
	
        // 失败: 否则, 重新开始prepare
	if (numReceived == RCONF->GetNumNodes())
		StartPreparing();
}

 

猜你喜欢

转载自jimmee.iteye.com/blog/2313581