Previous article: Leader election Raft of the algorithm
before finished Leader electoral process Raft algorithm, this article describes log replication on the basis of an article on.
Raft of log replication algorithm
Look at the basic contents of the log contains:
- Copy command can be executed by the state machine
- Number tenure: Leader of the current term of office which is created when the log No.
- Index: integer that identifies the location where the log
Status log is divided into two: uncommitted, has been submitted (logs for security, will not be deleted or overwritten).
Normally 1
- When
Leader
receiving a request sent by the client (the request may be contained in a state machine executing a copy command), Leader will place the request as a new content is added to the log (number of the current termLeader
in which the term number, index No. currentLeader
set of logs in a log stored locally highest index number plus 1).Leader
Can make up a given index number to the log (i.e., two or more possible to create a log entry with the same index term in a) in the current term
- Then the log by
AppendEntries RPC
sending a message to the other servers in the network (hereinafter referredFollower
) to copy the log. - In the network
Follower
after receiving the message log successfully copied reply is returned. - In
Leader
receiving the majority of the networkFollower
after successfully copied reply, andLeader
then consider the log can be submitted . At this pointLeader
you will do three things simultaneously:
- The logs to the
Leader
local copy of the state machine - All to
Follower
all the received notification message transmission logFollower
the commit log, and then applied to the respective local copy of the state machine - The execution result notification from the client
When the log messages in the network most successful Follower
local copy state machine after the execution, the log can be considered to have been submitted . During the pre-log is submitted, if Leader
some previous log has not been submitted, it will be submitted together.
The network, some Follower
may be due to network state reasons for slow response or crash, it Leader
will be indefinitely repeated attempts to send AppendEntries RPC
a message to the Follower
. Until it succeeds.
1.1 log consistency check
In the above, we said Follower
upon receiving AppendEntries RPC
the message is returned to replicate the success of reply. Indeed consistency check after receiving the first message in a log (normally Leader
the Follower
log will remain consistent consistency check not failed), the consistency check as follows:
- In the
Leader
creationAppendEntries RPC
message, the message will contain a term of number and index number before the current log log entries. Follower
In receiving theAppendEntries RPC
message, the log before the term of office will check whether the number and the index number to be matched- If a match to the description and
Leader
the previous logs are consistent. - If no match is rejected
AppendEntries RPC
message.
- If a match to the description and
A consistency check is inductive process. Normally , the first log in the network must meet the log consistency check, then the second log comprises a first log term number and index number, so long as Leader
the Follower
first log consistent, then the second logs will meet consistency checks, so each log will meet after the consistency check.
To arrive at the log matching attributes:
- If two different entities have the same log index number and term of office, then they have the same storage command.
- If two different entities log term and have the same index number, all the previous entries in the log are the same. (Derived from the consistency check results)
2 special circumstances
The network could not have been in a normal situation. Because Leader
or a Follower
possible crash, resulting in the log can not always consistent. Therefore, the following three cases:
Follower
The current lack ofLeader
log entries on the present.Follower
There is a currentLeader
log entry does not exist. (Such as the oldLeader
simplyAppendEntries RPC
send a message to a partFollower
on crash out, then elected a newLeader
server did not receive exactly theAppendEntries RPC
server messages)- Or
Follower
that is currently missingLeader
log entries exist on, there is currentLeader
log entry does not exist
FIG uppermost log index is a number (1-12), each square represents a log message, the number represents the log block in which the number term. FIG current Leader
(FIG uppermost row representing the current log Leader
log) time in term of number of 8. This figure illustrates the reason for the existence of the above three cases:
Follower
A, B (Follower
collapse does not receive theLeader
transmittedAppendEntries RPC
message) satisfies the first case described above.- (
Follower
C at the time term for 6,Follower
D in term of time. 7) isLeader
, but not fully completed log transmission will collapse. Satisfying the above described third case. Follower
e at the time of term 4,Follower
F in the term of time of 3Leader
,, but not completely finished sending the log will be collapsed while the other elected serverLeader
does not receive a new timeLeader
transmittedAppendEntries RPC
message, satisfy the third case.
2.1 log inconsistent solutions
Leader
By forced Follower
to deal with inconsistencies repeating his log log. This means that the Follower
conflict log log will be Leader
covered log entries. Therefore Leader
you must find the Follower
location of the beginning of the log conflict occurs, then delete Follower
all the Leader
log conflict. Then send their logs to Follower
in order to resolve the conflict.
Leader
It does not delete or overwrite your own local log entry
These steps starting from a consistency check before it comes to the log.
- When a conflict log,
Follower
will be rejected byLeader
sending theAppendEntries RPC
message, and returns a response message to informLeader
log conflict. Leader
For eachFollower
maintain anextIndex
value. This value is used to determine to be transmitted to theFollower
location index of the next log. (The value at the current server has been electedLeader
after reset to a local log of the last index number +1)- When
Leader
later we learned that log conflict, is decrementednextIndex
value. And resendAppendEntries RPC
to thatFollower
. And repeat this process untilFollower
receiving the message. - Once
Follower
accepted theAppendEntries RPC
news,Leader
then according tonextIndex
the position value can be determined conflict, thereby forcingFollower
log repeat their logs to resolve the conflict.
- Where A : As shown, the server S1 is the only time of 2 log term
<index:2,term:2>
sent to the server S2 will crash out. - Case b : server S5 in term of the elected Hour 3
Leader
( S5 timer lead in overtime, No. 3 increments term therefore higher than the server S3, S4 , may be electedLeader
), but did not have time to send log will crash out. - Case c server: S1 in term of time of 4 re-elected
Leader
( S1 restart, a term still 2 receives newLeader
S5 updated heartbeat message sent by a term of 3, while inLeader
S5 after a crash, the server S1 for the first timer expires, therefore poll for a term of 4 update, greater than elsewhere in the network server tenure, has been electedLeader
), while the log<index:2,term:2>
is sent to the server S2, S3 , but has not submitted a notification server logs will crash out. - Where D : where ( A-> D ) If the term server 2 is S1 as
Leader
crash out, S5 is a time of 3 elected termLeader
, because the log<index:2,term:2>
has not been replicated on most servers, and has not been submitted, so S5 through their own log<index:2,term:3>
overwrite logs<index:2,term:2>
. - Where E : where ( A-> E ) and if it is 2 when the term server S1 as
Leader
, and<index:2,term:2>
sent to the S2, S3 , the majority of members successfully copied to the server. And successfully submitted to the journal, then even S1 crash out, S5 can not be successfully electedLeader
, because S5 does not have the latest network log entries have been submitted ( here illustrate an article on Leader Raft algorithm of elections in electionLeader
requirements There are no requirements that point presentation ).
2.2 Election of Leader
the requirements of the log
- Raft use voting procedures to prevent
Candidate
win the election, unless they log log entry contains all submitted. Candidate
Most can contact the cluster must be elected, which means that each entry submitted must be present in at least one server. If theCandidate
log at least as of the most recent log server log (the precise definition of the latest ), it will save all the entries submitted.- Raft through the index and a term of comparison last entry in the log to determine which of two log is up to date. Log If the log entry has a different last term, the new term with a more up to date. If the log ends with the same term, places the index larger log shall prevail.
Optimization solutions
in Follower
reject AppendEntries RPC
message, may choose the log would conflict with the first term in the log index term contained in the reject message is returned to Leader
, so that Leader
can quickly locate the position of the conflict. With this information, Leader
you can decrement nextIndex
to bypass the mandate of entries for all conflicts. Each has a term in which the conflict log entry requires a AppendEntries RPC
message, rather than a need Each log entry AppendEntries RPC
message.
3 log replication security
Raft ensure that any moment each attribute here is set up
Leader
Only additional characteristics:Leader
never overwrite or delete its log entries, only added new ones.- Log Match: If two entities comprising a log having the same index and the term, then the index until the given date, all entries in the log are the same.
Leader
Integrity: If a log is prompted to submit within a given term, then the entry will appear in all higher term leader of the log.- State Machine Safety: If the server application log of a given index entity to its state machine, then no other servers can use a different log into the same index.
3.1 Leader
integrity proof
Assuming that Leader
integrity does not hold, then proved to be contradictory.
Assume a term of T is Leader
submitted current log entry term, but the log is not higher than the term T term for U future of the new Leader
stored.
- Is submitted for a term of T is not present in the logs must be elected for a term of U of
Leader
replicated state machine (becauseLeader
never overwrite or delete its log entries). - Term of T is
Leader
copied to the majority of members of the local cluster log. And a term of U ofLeader
reception to vote in the election stage most of the members of the cluster, so there is at least one member of the cluster (hereinafter referred to voters) that is received is from the term T of theLeader
logs are sent, but also for a term of U ofLeader
He voted. So the voters is the key to prove contradictory. - Voters must be for the term of office for the U of
Leader
the mandate for the vote before the T 'sLeader
commit log sent. Otherwise, as the voters will reject term T ofLeader
theAppendEntries PRC
request (because once the term for receiving U of theLeader
voting request voters term will be higher than T ). - When voters for the term of U of
Leader
voting, the store will have the log entry. Assumed to be in term of T and U each betweenLeader
both contain the log entry (Leader
never delete a log entry, butFollower
only in theLeader
delete entries when a conflict). - Voters for the term of U of
Leader
voting, so the term of U ofLeader
logs and voters must be at least as new log. This will lead to a contradiction in the two conflicts generate. - First, if the term of the voters and U of
Leader
the same date the log term. Then the term of U ofLeader
logs and voters at least as long as a log. So for a term of U ofLeader
the log will contain all log voters. This is a contradiction, because before the assumption of voters included tenure is submitted to T log, and a term of U areLeader
not included. - Otherwise, the term of office for the U 's
Leader
last term of office of the log number must be greater than the number of the last term of office a log of voters. Moreover, it is more than T big, because the number of voters a log term of at least T (which contains a term T entries for all of the submitted). Creating a term of U 'sLeader
last log entry of olderLeader
must contain entries submitted in its log (by hypothesis). Then, by matching the log attribute for a term of U ofLeader
logs it must also contain an entry has been submitted, which is contradictory. - This proved contradictory, so all term greater than T 's
Leader
must include all term for T logs are submitted. - Log matching attributes to ensure the future
Leader
will also include indirect log entries submitted.
Next article: the relationship between changes in membership Raft Algorithms