HadoopHa introduction and implementation process

HadoopHa Introduction

     HA (High Available), high availability , a business continuity effective solution, there are generally two or more nodes into the active node (Active) and a standby node (Standby) . Usually referred to as service executing active node, and as a backup active node is known as a standby node. When there is a problem active node, resulting in service (task) is running does not run correctly, the standby node will be detected at this time, and immediately follow the active node to perform business. In order to achieve business without interruption or a short break.

 HadoopHa details

     Hadoop1.X version, NN HDFS cluster is a single point of failure points, each cluster has only one NN, if the machine or process is not available, the entire cluster can not be used. To solve this problem, there has been a bunch for HDFS HA solutions (eg: Linux HA, VMware FT, shared NAS + NFS, BookKeeper, QJM / Quorum Journal Manager, BackupNode , etc.).

     After hadoop2.x, Clouera proposed QJM / Qurom Journal Manager, which is HDFS HA solution based on Paxos algorithm (Distributed Consensus Algorithm) implemented, which gives a better solution ideas and programs, QJM main advantage as follows:
no need to configure additional high shared storage, reduces the complexity and maintenance costs.
Eliminate spof (single point of failure).
Degree of robustness (the Robust) system can be configured and extended.

    The basic principle is to use 2N + 1 station JournalNode storage EditLog, each write data operation has> = N + 1 return that is successful considers the write is successful, the data will not be lost

 HadoopHa execution process

   Two high-availability scheme NN, one Active state, said it is to provide services, dynamic NN.
StandBy state, not to provide services, ready to replace ActiveNN.
ZKFC: monitoring node where NN hardware, software (NN), operating system. While maintaining communication with the ZK. HA solutions are only two NN, NN each has a ZKFC.
Confirm NN Active state and StandBy state:
    two clusters registered ZK NN to a temporary ZNode, which first registered, which is Active, the other one is StandBy .   
when ActiveNN node failure  ActiveZKFC notice ZK delete temporary zNode  StandBy state ZKFC Subscribe to this temporary zNode transformation, if zNode disappear immediately while StandBy NN  StandBy NN remote login NN, execution Kill -9 ActiveNN  StandBy NN notice StandBy ZKFC ZK go on registered temporary zNode.
How two metadata information for fast synchronization?
JournalNode (JN): efficient storage system enables very fast write and read data. JN is a small cluster (a small file system). Is an odd number of nodes needed station (l, 3,5)
ActiveNN FSimage and real-time access log JN. StandByNN data acquired in real time inside JN, real-time synchronization of metadata two nodes.

ResourceManager there is a single point of issue

Solutions ResourceManager HAS

Two ResourceManager, an Active a StandBy.

Two ResourceManager switch is registered in ZK temporary node.

 

 

Published 48 original articles · won praise 54 · views 210 000 +

Guess you like

Origin blog.csdn.net/qq_43791724/article/details/104801303