HDFS (b) high availability architecture

introduction

There is only a namenode node in HDFS framework hadoop 1.x, when after the namenode node memory overflow occurs, downtime and other unforeseen circumstances, the whole system will be out of service until we restart this namenode node. To solve this problem, in the framework of hadoop2.x HDFS, the mechanisms that implement the HA, let's look at how this mechanism in 2.x is to ensure high availability of the system.

Architecture

1.x there is only one system can not guarantee it namenode long run, so the first key point is to add a few more this system namenode node, after which a namenode hang up, on top of other namenode to provide services, but there is more than a namenode will increase the complexity of the system, so HDFS high availability architecture 2.x, and only equipped with two namenode nodes (active and standby). The system is running, only the active node only provide services, standby node is responsible for synchronizing edit files, merge with local fsimage files, thus ensuring the consistency of the data they store.

Shared memory

But in sync edit files also faced with a problem, that is, how to edit the file to change the frequency synchronization settings? If you insert a file for each active, standby is necessary to synchronize a data, then the system performance will drop a lot; if certain time intervals to synchronize edit files, then the system there may be at risk of losing data, and therefore the second the key point is that design can edit the file sharing system, active node is responsible to edit this file system write data, standby node is responsible for synchronizing edit this file system.

Since this system stores edit files, so we need to ensure that this system is highly available, hadoop offers two programs - QJM (Quorum Journal Manager) and NFS, I will focus on QJM, you want to know a little NFS partners can look at this blog . QJM journal comprising 2N + 1 th nodes, each node provides a simple journal RPC interface, allowing namenode read or write data. When writing namenode edit file, it transmits a write request to the journal node in the cluster, only when the N + 1 nodes successfully written, it is written on behalf of the client is successful.

Split brain

Now more and more clear architecture of the system, two namenode nodes, edit a shared file system, data is written active node, standby node synchronization edit the file, the system may appear to serve a long time, but there is still a problem. Consider a scenario, active during operation of the network nodes is broken, then the system can not be mistakenly thought that the service, to convert the standby node to active node. But after the active node in the network before recovery can still provide service, then the system there namenode two nodes during operation will be a variety of problems, and this condition is called split-brain. This is the mechanism of HA third key point - how to prevent split brain problem occurs.

In QJM, the split is to solve the problem of the brain through several epoch. When namenode becomes active state, will be assigned to a number of epoch, this number is unique, and is higher than the number of all epoch namenode held before. When sending a message to namenode journal, it will bring this number. journal node when the message is received, only when this epoch is larger than the number of locally stored opoch will process the request or refuse the request.

By the number of epcho just cut off the communication namenode and journal node of the cluster, but the client still can and this namenode communication, because of this namenode can not and journal node communication, and therefore can not handle the client's request, which would cause the client not to the system on-line operation, in order to solve this problem, HDFS provides fencing mechanism, when the standby node into active node sends a command through ssh way to kill another namenode process on the machine, make sure that the system has only one active namenode node exists, this would resolve the problem of split brain.

Failover

The above content is only talked about when namenode not service our solutions, but did not mention how we detect whether namenode hung up, and how stangdy node into active node. This is the mechanism of HA fourth critical point - when and how failover.

The following sections reference https://blog.csdn.net/Androidlushangderen/article/details/53148213

To test whether namenode hang, HDFS introduced zookeeper, when a namenode been successfully switched to the active state, it will create a temporary znode inside the zookeeper, to retain the current number of active nodenode information in this will znode, such as host name and many more. When a failure occurs in the case of active nanode, temporary znode delete the corresponding monitoring program will zookeeper, znode deletion will automatically trigger event to the next active namenode choice.

In order to solve the problem failover, HDFS introduced ZKFC (ZKFailoverController) component, which is the core object HDFS HA ​​automatic switching, which is ZKFC process we usually started on namenode node, in this process internal, run these three service objects :

  • HealthMonitor: monitoring namenode is unavailable or into an unhealthy state
  • ActiveStandbyElector: control and monitoring on the node status ZK
  • ZKFailoverController: Coordination HealMonitor and ActiverStandbyElector objects, the event processing are sent to the event variation, switching automatically complete the process.

When HealthMonitor detected that the current namenode unhealthy, ZKFailoverController calls quitElection method ActiveStandbyElector in, so that their temporary withdrawal Active state; if namenode state of health, ActiveStandbyElector in joinElection will be called to participate in the election namenode. This is roughly the failover process, detailed analysis of the source code can refer to the above article.

After the four key points in HDFS is resolved, we also have a mechanism for HA full understanding, is given below its overall architecture diagram:

HDFS mechanism of HA

The above is my summary of HDFS HA ​​mechanisms, which if there is an error, please point out!

Guess you like

Origin www.cnblogs.com/firepation/p/11442735.html