[Distributed] VMware FT Overview

The situation of 1primary + 1backup is discussed, which is easier to understand. In 6.824, this paper is used as an example to introduce the concept and method of replication in distributed systems. The following briefly describes some key points, derived from the MIT6.824 course, translated version https://mit-public-courses-cn-translatio.gitbook.io/mit6-824/lecture-04-vmware-ft

1. Two copying methods

  • state transfer
  • Copy state machine

The difference is that state transfer transfers internal information of the system, while copying state machines transfers external events. An execution statement may change many (or even all) internal states (such as changes to the schema in the database). If the internal state is transferred every time, too much is involved. The operation granularity of transmitting execution instructions is relatively small.

The reason people tend to use replicated state machines is that generally speaking, external operations or events are smaller than the state of the service. If it is a database, its status may be the entire database, which may reach the GB level, and the operations are only requests initiated by some clients, such as reading the data of key27. So operations are usually smaller, and states are usually larger. So replicating state machines is generally more attractive. The disadvantage of replicating a state machine is that it is more complex and makes more assumptions about the operation of the computer. The state transfer is relatively simple and crude. I just send my entire state to you, and you don't need to think about anything else.

2. Working principle

FT of VMware FT = Fault tolerance. If you want to achieve error tolerance, you need replication (to prevent state loss caused by a machine hanging up)

VMware FT requires two servers (two virtual machines on two physical machines): primary and backup. VMM allocates a section of memory to each virtual machine, and their memory images need to be completely consistent.
The client sends instructions to the primary, and then the primary sends the same instructions to the backup. Both primary and backup will execute instructions and generate responses, but only primary will reply to the client, while backup will discard the response. In the VMware FT paper, the channel of synchronized data flow from Primary to Backup is called Log Channel.

Normally, the primary receives the command and then passes it to the backup through the log channel. If the backup cannot receive instructions from the primary within a specified time, it can be considered that the primary is down/has a problem. At this time, the backup will no longer wait for events from the primary's Log Channel. The backup is not driven by the primary event at this time. At the same time, the client is informed that subsequent requests are sent to the backup instead of the primary. The response generated by the backup will not be discarded directly at this time, and it becomes the new primary.

3. Uncertain events

It can be understood that the same event has different results on different machines. It is also a special case that needs to be considered when copying the state machine method.

  1. Client input
    In a distributed system, system input here refers to network packets. A network data packet consists of two parts, one is the data in the data packet, and the other is the interrupt that prompts the arrival of the data packet.
    The location at which the interrupt is triggered needs to be consistent for primary and backup, otherwise the status will deviate.

  2. "Weird" instructions: These instructions have different execution results on different machines,
    such as random number generation, instructions to obtain the current time, and instructions to obtain the unique ID of the computer.

  3. Multi-CPU concurrency
    When the service runs on multiple CPUs, instructions will be interleaved and run on different CPUs, so the order of instruction execution is unpredictable.

Assume that two cores request a lock for the same data at the same time: on the primary, CPU core1 obtains the lock; on the backup, due to subtle time differences, CPU core2 obtains the lock, then the execution results may be very different.

4. Log format guessing

Professor Robert guessed that there are three things in the log (the information passed in the log channel):

  • event number
  • Log entry type (normal network data or weird instructions)
  • Data (network data packet content, that is, the content of ordinary instructions or the execution results of weird instructions)

5. Third-party organization to prevent split-brain: test-and-set service

The solution to the split-brain problem caused by network partition: a third-party authority is needed to decide whether primary or backup is allowed to go online. This is the test-and-set service.

This is like a lock. The service retains some flags in memory. Primary and backup need to send test-and-set requests to this service.

When the first request arrives, the Test-and-Set service will say that this flag was 0 before and is now 1. When the second request is sent, the Test-and-Set service will say that the flag is already 1 and you are not allowed to become Primary. For this Test-and-Set service, we can think of it as running on a single server. When a network failure occurs and both replicas believe that the other is down, the Test-and-Set service acts as an arbiter and decides which of the two replicas should come online.

To become the primary node, it must request this third-party organization and obtain permission (get 0).
That is to say, if the current test and set is 0, then the node can become the primary. After becoming the primary, the flag bit is set to 1; if the current flag bit is 1, it can no longer become the primary. Essentially, this is a simplified lock service. Which machine gets the "lock" can become the primary.

Supongo que te gusta

Origin blog.csdn.net/qq_39679772/article/details/132484352
Recomendado
Clasificación