SecondaryNamenode persistence

I say today is SecondaryNamenode

SecondaryNameNode is to help solve the above problems, it is the duty of the merger NameNode edit logs to fsimage file.

SecondaryNameNode never replace Namenode position, he is a hot standby Namenode of;
first of all, is to master a number Namenode metadata (data describing data) data ---- "in memory
persistence ------" in order to ensure the security of metadata, the data will be stored in memory to disk.

There is such a situation often encountered: power outage in the process of using computers in

When our cluster problems due to power outages and other special reasons, the problem plus MG, reboot, reads the metadata on the disk, restore to the state before the power failure; we should use to persist in this recovery process
Namenode can not persist because:
1: in fact, it can be done: a small demand. Small memory footprint, no impression computational efficiency
2: can not do: Namenode had big workload, there may freeze in the persistence of the process;

After the computer boots, the Namenode edits.log began to prepare the system to produce the information stored in the operation during operation, while fsimage; fsimage adhesive sheet corresponding to a follow edits.log saved to the SNN together, then the SNN fsimage merged into one again sent to NN, the NN will have new time edits.log, this operation is repeated
following drawing:
process
when combined in the SNN. edits and more than 64M:
1: an isolated phenomenon
Start a edist which there will be two edits while
2: Normal
is necessary to adjust the cluster, turn up the edits size;

Persistent trigger condition: 3600S or edits exceed the size of more than 64M
Summary: Persistence is the NN to write metadata to disk for storage, go back and read the corresponding metadata disk when NN hung up the restart when recovery status of a cluster ---- "power-off memory loss

Power outages:
Before persistence: Start again, read the system log
after persistent: read the data on the disk to restore data

Duplicate off
NN DN and communication mechanisms - "heartbeat mechanism: every three seconds, sends a heartbeat to the DN NN, one minute no heartbeat is considered DN hung up;

Safe Mode are:
1: restore the system state :( persistence)

2: Check the information of DN :( heartbeat mechanism)

3: Faulty DN repair
after repair of problems in the DN. If there are new tasks, depending upon whether a new file upload
after the repair does not continue to go in this DN is stored, such as To find this information, you can check in the backup; when assigned new tasks, the repair of DN you can work;

Guess you like

Origin blog.csdn.net/sincere_love/article/details/91469306