New features of HDFS2.0 - federation mechanism, HA high availability and high availability implementation

federal

When it comes to the Federation, it is easy to think of a country such as the United States, which is a united country composed of states. Each state has its own constitution and laws and exercises its own rights. Our federation here is also similar to this. With this mechanism, HDFS clusters can use multiple independent NameNodes to manage to meet the horizontal expansion of the HDFS namespace. These NameNodes manage part of the data respectively and share the storage of all DataNodes. resource.

In layman's terms, a NameNode manages part of the file system namespace. For example, NameNode1 manages all files in the /usr directory, and NameNode2 manages all files in the /share directory.

In a federated environment, each NameNode maintains a namespace volume, which consists of namespace metadata and a block pool. The data block pool contains all data blocks of all files under this namespace. The failure of one of the NameNodes will not affect the availability of other NameNodes to maintain the namespace, and the data block pool is no longer divided, so the DataNodes in the cluster need to be registered with each NameNode and store data from multiple data block pools. of data blocks.

Introducing a federation mechanism can help us solve the following problems with a single NameNode:

1) The scalability of HDFS cluster. Metadata is stored in the memory of NameNode, so it is easy to be limited by the upper limit of NameNode memory. Therefore, multiple NameNodes are used to manage a part of the NameSpace.

2) Isolation. A program may affect other running programs, such as a program consumes too many resources and other programs cannot run smoothly. In 2.0, users can interact with different business data and be managed by different NameNodes as needed, so that the impact between different businesses is small.

3) The performance is more efficient. Multiple NameNodes provide external services at the same time, providing higher read and write throughput.

high availability

Backing up the metadata of the NameNode in the file system and creating monitoring points through the SecondaryNameNode can prevent data loss, but the high availability of the file system cannot be avoided because the NameNode still has a single point of failure.

In this case, to recover from a failed NameNode, the system administrator needs to start a NameNode with a copy of the filesystem metadata and configure the DataNodes and clients to use the new NameNode. However, the recovery time of the system will be too long, and it will seriously affect the daily maintenance (restarting the NameNode will load Fsimage and edits, and then receive enough data block reports from the DataNode [in safe mode], this startup process 30 minutes, or even longer).

In order to solve this problem, Hadoop2 provides support for HDFS high availability by configuring a pair of active-standby (Active-StandBy) NameNodes. When the ActiveNameNode fails, the StandByNameNode takes over its tasks and starts serving requests from clients without any noticeable interruption.

1) NameNodes need to share edit logs through highly available shared storage. When the StandByNameNode takes over, it reads through the shared edit log until the end of the file to synchronize the state of the ActiveNameNode and continues to read new entries written by the ActiveNameNode.

2) DataNode needs to send data block processing reports to two NameNodes at the same time, because the mapping information of data blocks is stored in memory in NameNode, not in disk, so two NameNodes need to send data block processing reports.

3) The client needs to use a specific mechanism to deal with the failure of the NameNode, which is transparent to the user.

4) The role of SecondaryNameNode is included by StandByNameNode, which periodically checks the ActiveNameNode namespace, so there is no need to run SecondaryNameNode, CheckPointNode, and BackupNode to implement the fault-tolerant mechanism.

5) After ActiveNameNode fails, StandByNameNode can quickly take over tasks because the latest state is stored in the memory of StandByNameNode: including the latest edit log entries and the latest data block mapping information. It is actually observed that the failure time is slightly longer, because the system needs to conservatively determine whether the ActiveNameNode is really failed.

high availability implementation

The quorum log manager, QJM - is designed to provide a highly available edit log and is recommended for most HDFS deployments. QJM operates in the form of a group of journal nodes JN (Journal Node), and each edit must be written to the journal node JN. The default is three JNs, so the system can tolerate the loss of any one of them. At the same time, QJM only allows one NameNode to write data to the edit log, so it is necessary to set a SHH avoidance command to kill the previously active NameNode (the previously active NameNode may still be processing and responding to outdated read requests from customers).

In the system there is an entity called Failover Controller (Failover Controller), which manages the conversion process from ActiveNameNode to StandByNameNode. There are various failover controllers, but the default one uses ZooKeeper to ensure that there is only one ActiveNameNode. Each NameNode runs this lightweight failover device, whose job is to monitor whether the host NameNode fails, and perform failover if it fails.

The active/standby switching of NameNode is mainly implemented by three components: ZKFailoverController, HealthMonitor and ActiveStandByElector.

ZkFailoverController: Started as an independent process on the NameNode machine, two internal components, HealMonitor and ActiveStandByElector, will be created during startup. HealMonitor is mainly responsible for detecting the health status of NameNode, and ActiveStandByElector is mainly responsible for completing automatic master-standby election, and internally encapsulates the processing logic of ZooKeeper.

NameNode switching process:

① HealthMonitor will regularly check the health status of NameNode.

② If HealthMonitor detects that the health status of the NameNode has changed, it will call back ZkFailoverController for corresponding processing.

③ If ZkFailoverController judges that active-standby switchover is required, it will first call ActiveStandByElector to perform automatic active-standby election.

④ ActiveStandByElector interacts with ZooKeeper to complete the automatic master-standby election. After the election is completed, ZkFailoverController will be notified that the current NameNode becomes ActiveNameNode or StandByNameNode.

⑤ ZkFailoverController changes the current NameNode status to Active or Standby status.

 

 

Guess you like

Origin blog.csdn.net/qq_35363507/article/details/113172792