Introduction to HDFS and YARN HA

HDFS:


Infrastructure
1, NameNode (Master)

1) Namespace management: Namespace supports basic operations such as file system creation, modification, deletion, and listing of files and directories for directories, files, and blocks in HDFS.

2) Block storage management.
Using Active NameNode, Standby NameNode two nodes can solve the single-point problem. The two nodes share the state through the JournalNode, elect Active through ZKFC, monitor the state, and automatically backup.

1. The Active NameNode

accepts and processes the client's RPC request, writes its own Editlog and the Editlog on the shared storage, and receives the DataNode's Block report, block location updates and heartbeat.

2. The Standby NameNode

will also receive Block report, block location updates and heartbeat from the DataNode, and will read and execute these log operations from the shared storage Editlog, maintaining the metadata in its NameNode (Namespcae information + Block locations map ) and the metadata in the Active NameNode are synchronized. Therefore, the NameNode in Standby mode is a hot standby (Hot Standby NameNode). Once it is switched to Active mode, NameNode services can be provided immediately.

3. The JournalNode

is used for Active NameNode, and the Standby NameNode synchronizes data. It consists of a group of JournalNode nodes, and the group of nodes has an odd number.

4. ZKFC

monitors the NameNode process and automatically backs up.

YARN:



Infrastructure
1. ResourceManager (RM)

receives client task requests, receives and monitors resource status reports from NodeManager (NM), is responsible for resource allocation and scheduling, and starts and monitors ApplicationMaster (AM).

2. Resource management on the NodeManager

node, start the Container to run the task calculation, report the resource and container status to the RM, and report the task processing status to the AM.

3. The ApplicationMaster

manages and schedules the tasks of a single Application (Job), applies for resources to the RM, sends a launch Container command to the NM, and receives the task processing status information of the NM.

4. Web Application Proxy

is used to prevent Yarn from being attacked by Web. It is a part of ResourceManager and can be configured as an independent process. ResourceManager Web access is based on trustworthy users. When Application Master runs on an untrusted user, it may provide untrusted connections to ResourceManager. Web Application Proxy can prevent such connections from being provided to RM.

5. Job History Server

NodeManager will initialize the LogAggregationService service when it starts, which will collect and store the container log executed by the machine (when the container ends) in the directory specified by hdfs. ApplicationMaster will write jobhistory information to the temporary directory of jobhistory in hdfs , and move the jobhisoty to the final directory at the end, which supports the job's recovery.History will start the web and RPC services, and users can obtain job information through web pages or RPC.

HA Architecture
ResourceManager HA consists of a pair of Active and Standby nodes, which store internal data and data and tags of main applications through RMStateStore. The currently supported alternative RMStateStore implementations are: MemoryRMStateStore based on memory, FileSystemRMStateStore based on file system, and ZKRMStateStore based on zookeeper. The architecture mode of ResourceManager HA is basically the same as that of NameNode HA. Data is shared by RMStateStore, and ZKFC becomes a service of the ResourceManager process and does not exist independently.

The datanode and nodemanager in the cluster are theoretically the same

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326846398&siteId=291194637