Hadoop introduce --HA Federal

Hadoop 1.0 and the HDFS MapReduce problem in terms of high availability, scalability:

- HDFS problems

    • NameNode single point of failure, difficult to apply online scene HA

    • NameNode pressure, and the memory is limited, shadow scalability F

- MapReduce problems sound system

    • JobTracker access pressure, affect system scalability

    • difficult to support in addition to MapReduce computational framework, such as Spark, Storm and so on.

15737641-6fe761808840fca6.png
hadoop1.x and 2.x

-Hadoop 2.x by the HDFS, MapReduce YARN three branches and configured;

    • HDFS: NN Federation (federal), HA;

-2.X: it supports only two nodes HA, 3.0 to achieve a multi-primary backup

    • MapReduce: MR running on the YARN;

- off-line calculation, based on the disk I / O is calculated

    • YARN: Resource Management System.

HA (High Availability, single point of failure to solve the problem): By solving NameNode standby, • with primary NameNode fails, switching to the standby NameNode.

Federal (- solve the problem of limited memory) : horizontal scaling, supports multiple NameNode. Each NameNode in charge of part of the directory; all NameNode DataNode share all storage resources.

15737641-119b34f347dfed3d.png
HDFS 2.x's High Availability

HA process:

- 主 备 NameNode

- to solve the single point of failure (attribute position)

    • Main NameNode external services, equipment NameNode NameNode synchronizing master metadata to be switched

    • All DataNode while reporting block information (position) two NameNode

    • JNN (Journal Node): a shared cluster (properties) of the two NameNode log file sharing, keeping the same data.

    • standby: standby node, completed the merger edits.log file to generate new image, pushed back ANN

- two kinds of switch selection

    • manual switching: switching between the standby command, and so can be used to upgrade HDFS

    • Automatic Switching: Based Zookeeper (task scheduling management) to achieve

- automatic switching scheme based Zookeeper

    • ZooKeeper Failover Controller: NameNode monitoring health status,

    • Zookeeper and registered NameNode

    After • NameNode hang, ZKFC NameNode competition for the lock, the lock NameNode get ZKFC becomes active.

Generally composed of two NameNode in hadoop2.0, one in the active state, the other is in standby state. Active NameNode external services, and Standby NameNode not provide services only sync active namenode state to be able to quickly switch when it fails. NameNode between the primary and by a set JournalNode sync metadata information, as long as the data is successfully written to a JournalNode i.e. that most successfully written. Typically configured odd number JournalNode. There is also equipped with a cluster zookeeper for ZKFC (DFSZKFailoverController) failover, when Active NameNode hang, Standby NameNode automatically switches to standby state.

15737641-0ea70844e968f911.png
Federal structure

•HDFS  2.x  Federation

- by storing and managing a plurality namenode / namespace metadata distributed to the plurality of nodes, Shidao namenode / namespace can be expanded by increasing the level of the machine.

- can namenode single load across multiple nodes, can not degrade the performance of HDFS HDFS data in larger time scale. You may be isolated by a plurality of different types of applications namespace, the storage and management of different types of applications HDFS metadata assigned to the different namenode.


Installation reference links.

Guess you like

Origin blog.csdn.net/weixin_34088598/article/details/91013996