NN Yarn HA shared datanode

namenode+secondaryNamenode fsimage persistence+edit log, processing operations are placed in edit, after a period of time, the number of edit files is huge, which needs to be merged on SN, SN downloads image+edit to the local and merges it into a large full file image, merge is completed, return to NN to replace the original image

However, once the NN is down, it cannot provide services, so two namenodes are required, one active and one standby

Data synchronization: keepalived cannot solve the problem of data synchronization. When the distribution is troublesome, use a third party or increase the number of cluster bytes.

So put the active edit log file on a third party, because the image is too big and not easy to update, it needs to be loaded into memory, so put the edit log on a third party, NN share

To solve data synchronization, the last log of inprogress cannot be synchronized. Once activeNN dies, standby can take office.

But once the third party hangs up, the NN can work, but the data between the NNs will not be synchronized.

Therefore, the third party becomes a cluster, and each machine retains the log, but the data consistency will be reduced, the number of machines will be large, and the synchronization will be delayed, but it will not be too bad.

The third party is the zookeeper to coordinate, and the realization of log management is called quorum journal node with the help of zk coordination,

However, it is necessary to keep some edits in NN, which is more reliable. Synchronous threads are stored synchronously, local storage, and third-party storage. However, the third-party has multiple machines. As long as the success of most of them can be guaranteed, it is considered successful.

The state management and switching of active and standby are through ZKFC (zookeeper Failover Controller ), that is, under each namenode, the zookeeper that also depends on zookeeper and quorum journal node is the same

First, zkfc calls active through rpc to judge the namenode status according to the return value, etc. Once active hangs up, it will not notify standby directly, but will establish a node in zookeeper to represent the status through zookeeper's monitoring mechanism, namely active and standby. If it is dropped, this node will be deleted, and the other zkfc is monitoring this node, and if it finds deletion, rpc calls standby->active and then registers a lock with zookeeper to declare it active

But before the state becomes active, it will judge whether the original namenode is really hung up, it may be suspended, if it is restored, two activers, so make sure the activedeque hangs up

Through ssh remote kill -9 to determine whether to kill by the return value, but the network will time out, etc., there is no return value for a long time, so force kill through the user script and then switch to active


And yarn does not need high availability so urgently, because there is no metadata synchronization and other problems, just hang up and restart.


One active can have multiple standbys because it does not involve data synchronization and metadata management. Active registers a lock with zk to declare it active



The horizontal expansion of the namenode capacity. If there is a lot of data, the memory and disk of the namenode will be full, so it is necessary to expand the capacity. It does not mean adding a disk or adding a memory. The maximum memory seems to be 128G. Anyway, it is impossible on one machine. Heap addition, so the horizontal expansion of the namenode capacity is required.

It requires multiple namenodes. The high in front may actually be regarded as a namenode, because the data is exactly the same,


If multiple namenodes provide external services at the same time, the data will be out of sync, so hdfs needs to be sub-directory, partitioned, and request the specified directory to go to the specified namenode

Two namenodes become federation federation

The two namenodes do not affect each other, and either of them can be used in HA



A pair of namenodes under HA will have a logical name when the client accesses them, such as ns1. The client accesses ns1, and the client parses the configuration file to zk to know which one is active.

There are multiple namenodes in the federation, and then they are accessed through the upper encapsulation ViewFs://path


federation datanode

The datanodes do not need to be separated. If they are separated into two clusters, it makes sense. The datanodes can be shared.

The datanode has a clusterId, depending on which namenode it belongs to, the two namenodeclusterIds of HA are the same, and then the namenode clusterId in the federation must be the same to share the datanode, but there is also a flag in the datanode called blockpool as the ip address of the namenode, declaring a block belongs to the namenode in the federation

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324583752&siteId=291194637