The role of zookeeper in the Hadoop cluster (a)

First, what is Zookeeper 


ZooKeeper name implies zoo administrator, he was brought pipe elephant (Hadoop), bees (Hive), pig (Pig) administrator, Apache Hbase and Apache Solr and LinkedIn sensei and other items are used to Zookeeper. ZooKeeper is a distributed, open-source coordination service for distributed applications, ZooKeeper Fast Paxos algorithm is based synchronization service, maintenance and configuration of distributed applications naming services.

HDFS HA ​​principle

Problems NameNode single point of failure of a single defect, if NameNode is not available, it will cause the entire HDFS file system is not available. It is necessary to design highly available HDFS (Hadoop HA) to solve the problem NameNode single point of failure. The solution is to set up multiple NameNode HDFS nodes in the cluster. But once introduced more NameNode, there are some problems to be solved.
· HDFS HA need to ensure that the four issues:

  • NameNode the metadata to ensure consistent data memory, and to ensure the safety of editing the log file.
  • How to collaborate more NameNode
  • The client how to properly access to the NameNode available.
  • How to ensure that at any time only one in the Foreign Service NameNode state.
    · Solution
  • For consistency and to ensure the security log NameNode edit metadata, using Zookeeper to store the edit log file.
  • NameNode is a two Active state, a state is Standby, a point in time can have only one Active state
    NameNode provide services, metadata stored on two NameNode is real-time synchronization, when the Active NameNode problems, Zookeeper real time by switching to the Standby NameNode, and the Active state to Standby.
    o the client to determine by connecting a Zookeeper agency which was in service NameNode

a · HDFS HA architecture NameNode have two nodes, one is in an active state (Active) to provide services for the client, another one in hot standby (Standby).
b · metadata file has two files: fsimage and edits, backup metadata is to back up these two files. JournalNode for real-time edits copy files from the Active NameNode, JournalNode have three but also to achieve high availability.
c · Standby NameNode not provide external metadata access, copy it from the Active NameNode fsimage file, copy edits files from JournalNode, then in charge of merger fsimage and edits files, equivalent to the role of SecondaryNameNode. The ultimate aim is to ensure consistent metadata on the metadata and information on the Active NameNode Standby NameNode information to enable hot backup.
d · Zookeeper to ensure timely Standby NameNode modify the Active state when Active NameNode failure.
e · ZKFC (failure detection control) is Hadoop in a Zookeeper client on each NameNode node starts a ZKFC process to monitor NameNode the state and the NameNode status report information to the Zookeeper cluster, in fact, in Zookeeper Znode created on a node, the node in the NameNode save state information. When the NameNode failure, ZKFC detected reported to the Zookeeper, Zookeeper to delete the corresponding Znode, Standby ZKFC found no NameNode Active state, will use shell commands to monitor their NameNode changed Active state, and the modifications Znode data.
Znode is a temporary node, wherein the node is a temporary connection to the client will znode broken after deleting, when the ZKFC failure, also cause the switching NameNode.
f · DataNode heartbeat and Block will report information to more two NameNode, DataNode Active NameNode only accept incoming file reading and writing instruction.

Guess you like

Origin www.cnblogs.com/dangyichen/p/12066565.html