Raft no-op log

no-op log

Why can't the leader submit the logs of the previous term? He can only submit the logs of his own term, thereby indirectly submitting the logs of the previous term.

picture

First, according to the error situation, that is, the Leader can submit the log of the previous term. Then the above process:

  • s1 is the Leader of term 2 ( look carefully, there is a black box ), and the log has been copied to s2.
  • s1 goes down, s5 gets the votes of s3, s4 and s5 to become Leader, and then writes a log index=2 & term=3.
  • s5 crashed just after writing. s1 was re-elected as Leader, currentTerm = 4, and no new requests came in at this time. s1 copied the log with index=2 & term = 2 to s3. The majority was reached, and s1 submitted the log. (Note that term=2 is not the log for the current term, we are discussing the wrong case ). Then the request came in, and just after writing the local log of index=3 & term=4, s1 failed.
  • At this time, s5 can become Leader again (currentTerm>=5) through votes from s2, s3, s4 and itself, and copy the log of index=2 && term=3 to all other nodes and submit it. At this time, **index =2's log was submitted twice! ** Once term=2, once term=3, this is absolutely not allowed to happen, and the submitted log cannot be overwritten!
  • The situation here is that S1 copied its term=4 log to most machines before going down, so it is impossible for S5 to succeed in the election. This is the case where S1 does not fail and replicates correctly.

The problem is mainly explained here through © and (d). In fact, this picture will be easier to understand if you use the picture from Raft's big paper. (d) and (e) respectively correspond to the case of term=4 with or without copying to the majority.

picture

Therefore, we need to add submission constraints to prevent (d) from happening . This constraint is that the Leader can only submit logs for its own term .

Let's take a look again, what will it look like after adding constraints? Nothing changes from the previous (a) and (b), we start with ©.

  • © Still copy index=2 & term=2 to the majority. Since currentTerm = 4, this log cannot be submitted . If s1 copies the log of term = 4 to the majority, then the Leader can submit the log, and index=2 & term=2 will also be submitted indirectly together. In fact, this is the case of (e), 1-2-4 are all submitted .
  • I think the situation in (d) is the key to understanding the problem. If S1 only writes term=4 into its own log and then crashes; S5 is successfully elected to become Leader, and then copies the logs of index=2 & term=3 to all nodes. Now index=2 has not been submitted. Can S5 submit logs with index=2 & term=3?

The answer is no . Because the currentTerm of S5 after S1 (term=4) is elected is at least 5, it may also be 6, 7, 8... We assume it is 5, but this log term = 3, **Leader cannot submit the log of the previous term. Therefore, this log cannot be submitted. **Only when new requests come in and more than half of the nodes have copied 1-3-5, can the log of term=3 be submitted together with the log of term=5.

Although adding this constraint will prevent repeated submissions, if no new requests come in, won't index=2 & term=3 never be able to be submitted? Isn't it blocked here? If this is a kv database, the problem is obvious. Assume that the Command in the log with index=2 in © or (d) is Set("k", "1"), after S5 is elected as Leader, the client will query Get("k"), and the Leader will find that there is a record in the log but cannot reply 1 to the client (because according to the constraints, this log has not been Submit), linear consistency requires that stale data cannot be returned, and the Leader urgently needs to know whether this log can be submitted.

So the raft paper mentioned the introduction of no-op logs to solve this problem. This is implemented in etcd.

Introduce no-op log

The no-op log only has index and term information, and the command information is empty. It also needs to be written to disk storage.

The specific process is that when the Leader is elected successfully, a no-op log is immediately appended and copied to other nodes immediately. Once the no-op log is submitted, all unsubmitted logs in front of the Leader are submitted indirectly, and the problem is solved. Like the kv database above, with no-op logs, the Leader can quickly respond to client queries.

With the no-op log, the Leader can quickly respond to client queries.

In essence, no-op logs enable the Leader to implicitly quickly submit logs that have not been submitted in previous terms and confirm the current status commitIndex, so that the system can quickly work normally to the outside world.

Guess you like

Origin blog.csdn.net/weixin_46645965/article/details/135421720
log
log