Hadoop 2.0 Background
- Hadoop 1.0 and the HDFS MapReduce problem in terms of high availability, scalability
- HDFS problems (2)
- NameNode single point of failure, is difficult to apply online scenario HA
- NameNode pressure, and the memory is limited, shadow scalability F
- MapReduce problems sound system
- JobTracker access pressure, affect system scalability
- Difficult to support in addition to the MapReduce computing framework, such as Spark, Storm et
Hadoop 1.x与Hadoop 2.x
- Hadoop 2.x by the HDFS, MapReduce YARN three branches and configured;
- HDFS: NN Federation (federal), HA;
- 2.X: supports only two nodes HA, 3.0 to achieve a multi-master from
- MapReduce: MR running on the YARN;
- Calculated offline, disk-based I / O Calculation
- YARN: Resource Management System
HDFS 2.x
- Solve HDFS 1.0 single point of failure and limited memory problems.
- HDFS HA: solved by standby NameNode
- NameNode If the primary fails, the failover to NameNode
- Solve the problem of limited memory
- 2.x only changed the architecture, use the same
- HDFS transparent to the user
- HDFS in the 1.x can still use the command and API
HDFS 2.x HA Chart
- 主 备 NameNode
- Single point of failure to resolve (attributes, position)
- NameNode main external services, backup master NameNode NameNode synchronization metadata to be switched
- All DataNode while reporting block information (position) two NameNode
- JNN: Cluster (property)
- standby: preparation, complete merging edits.log file to generate new Image, pushed back ANN
- Two kinds of switch selection
- Manual switching: switching between the standby command, and so can be used to upgrade HDFS
- Automatic switching: realized based Zookeeper
- Automatic switching scheme based Zookeeper
- ZooKeeper Failover Controller: NameNode monitoring health status,
- Zookeeper and registration NameNode
- NameNode挂掉后,ZKFC为NameNode竞争锁,获得ZKFC 锁的NameNode变为active