HDFS HA architecture and principles

  • HDFS HA ​​architecture
  •   HDFS HA ​​principle
    • QJM

    • JN station with 2N + 1 storage EditLog, each write data operation has a majority (> = N + 1) times to return it that the write is successful, the data will not be lost if successful. Of course, this algorithm can be tolerated that up to N machines hang, to hang up if more than N stations, the algorithm becomes ineffective. This principle is based on Paxos algorithm. HA architecture in which SecondaryNameNode this cold standby role does not exist, in order to maintain standby NN always consistent with the metadata master Active NN, interactive JournalNode lightweight processes through a series of guard between them. When modifying any operation is performed on Active NN, JN process will also modify the log to record more than half of JN least in this time Standby NN monitored inside JN change the synchronization log reads JN inside the modified log, and then sync to your directory tree of images which, when a fault occurs, Active hang after the NN, Standby NN will it become before Active NN, JN read all of which change log so that we can guarantee high reliability and NN hang a mirror directory tree is consistent, then seamlessly take over its duties, maintenance requests from clients, so as to achieve a high-availability purposes

 

    • ZKFC

    • Hadoop provides ZKFailoverController roles deployed on each node NameNode as a deamon process, consists of three components:
      • HealthMonitor: monitoring NameNode is in unavailable or unhealthy state. NN corresponding current call completion through the RPC method
      • ActiveStandbyElector: manage and monitor their own status in the ZK
      • ZKFailoverController: Subscribe HealthMonitor it and ActiveStandbyElector events and management of state NameNode
    • Health monitoring: periodic monitoring of NN sent to its health probe command, thereby to determine whether a NameNode in a healthy state, if the machine is down, heart failure, then zkfc will mark it in an unhealthy state
    • 会话管理:如果NN是健康的,zkfc就会在zookeeper中保持一个打开的会话,如果NameNode同时还是Active状态的,那么zkfc还会在Zookeeper中占有一个类型为短暂类型的znode, 当这个NN挂掉时,这个znode将会被删除,然后备用的NN,将会得到这把锁,升级为主NN,同时标记状态为Active
    • 当宕机的NN新启动时,它会再次注册zookeper,发现已经有znode锁了,便会自动变为Standby状态,如此往复循环,保证高可靠,需要注意,目前仅仅支持最多配置2个NN
    • master选举:如上所述,通过在zookeeper中维持一个短暂类型的znode,来实现抢占式的锁机制,从而判断那个NameNode为Active状态

Guess you like

Origin www.cnblogs.com/xiangyuguan/p/11362497.html