NameNade hadoop learning persistence and understanding DataNode

 

 

 Which fsimage called backup point, also known as disk mirroring snapshot of this is a NameNode

One persistent way : shortcomings in the memory of the time sequence data more slowly

Specific process: Because we know NameNode is generally stored in memory, and disk and did not interact, and redis this kind of almost non-relational database, but the data in memory is not always persistent, then how to persist it? Like on our NameNode node data persistence process: first memory data sequence into a binary byte stream, after which it is stored to the computer in the form of IO file system, to complete the process of persistence , if the specific data needed when NameNode process requires the byte code file in the external memory, deserialized, then loaded into memory, it can be used for NameNode

 

Precautions: point Snapshot: only after a prescribed interval of time to go persisted to the external memory, such as 13:00, 15:00, 17:00. . . . Instead of every second during the persistent, because in that case would frequent disk and the external memory is to interact, so that the data acquisition time will be very long

 

 

 

Persistence of the two ways : shortcomings in the data stored in external memory process does not slow, but when saved to memory external memory recovery is slower

edits metadata recorded on the operation log. . . > Redis

That log editing: 1) data will be stored outside the village of a client to any server in the instruction are written to the log file operation log among 2) data from external memory is loaded into memory: direct re survive a log instructions to.

 

This way also the first time in a specified period of time to go back and persistence, rather than real-time

 

The general is to combine these two persistent

But how to combine these two ways to use it? First you need to understand fsimage produce documents and edits files when the file is generated for fsimage time to build hadoop cluster system, the file generated at this time is empty. Generate timing edits: the cluster startup, edits will generate a log file, then the file is empty, then after the boot is completed, log files and fsimage file merge, and then log has been increases, because after the cluster starts the client will continue to send real-time by NameNode to the cluster command, which will be recorded in the log file, issue this time brought: log files have been increases, so in this case more edits can not recover the data, it will be very long, so hadoop fsimage will sometimes edits and merge files through SecondNameNode

SecondaryNameNode concept: not nameNode backup, but only to the merger fsimage files and log files appear

 

 

Details consolidation process

Immediately after completion of the cluster structures: after generating an empty fsimage, after starting the system, will produce a fsimage file, then the client transmits an instruction to record NameNode edits, the edits become so large, but the second main check the node, when edits reaches a certain size, they will not let him continue to record the instructions, NameNode will be located node (ie master node) transmitted to the second main node edits are merged, while edits master node will clear, after receiving an instruction to record again when still grows from the size zero. After the merger sent to the NameNode, the original master node for substitution in fsimage, after the master node in that edits from the new start recording instruction sent by the client, after the entire process can be repeated . But this is hadoop1.0 time of persistent NameNode data processing mechanism. After 2.x version, this SecondaryNameNode disappeared

 

 

 

DataNode learn to understand

 

Guess you like

Origin www.cnblogs.com/isme-zjh/p/11532639.html