Article Directory
1. Overview of Distributed Computing
- Computing and Distributed Computing
- Calculation: process the data, use statistical analysis and other means to obtain the required results
- Distributed computing: Multiple servers work together to complete a computing task
- Two working modes of distributed computing
- Scatter→Aggregate (MapReduce)
- Central scheduling → step execution (Spark, Flink)
2. Overview of MapReduce
- MapReduce
- Distributed Computing Components in Hadoop
- Scatter → Aggregate Mode
- main interface
- map interface: "scatter" function
- reduce interface: "aggregate" functions
- operating mechanism
- Decompose the execution requirements into multiple Map Task and Reduce Task
- Assign Map Task and Reduce Task to corresponding servers for execution
3. Overview of YARN
- YARN
- Hadoop a component
- Resource Scheduling for Clusters
- Relationship between MapReduce and YARN
- YARN is used to schedule resources to allocate and manage running resources for MapReduce
- MapReduce requires YARN to execute
4. YARN Architecture
4.1 Core Architecture
- Core Architecture Roles
- Main: ResourceManager
- From: NodeManager
- Function
- ResourceManager: Manage, coordinate and allocate resources for the entire cluster
- NodeManager: Manage and allocate resources of a single server, that is, create a management container, and the container provides resources for program use
4.2 Auxiliary Architecture
- ProxyServer: Ensuring the security of web UI access
- JobHistoryServer: Record historical program running information and logs
5. Deployment of MapReduce & YARN
5.1 Cluster Planning
- node1:ResourceManager、NodeManager、ProxyServer、JobHistoryServer
- node2:NodeManager
- node3:NodeManager
5.2 MapReduce configuration file
- In the $HADOOP_HOME/etc/hadoop folder, modify:
- mapred-env.sh file
- mapred-site.xml file
- yarn-env.sh file
- yarn-site.xml file
5.3 Distribution configuration files
5.4 Introduction to cluster startup commands
- One-click start YARN cluster: $HADOOP_HOME/sbin/start-yarn.sh
- One-click stop YARN cluster:
$HADOOP_HOME/sbin/stop-yarn.sh
5.5 Start the YARN cluster
On node1 server, execute as hadoop user
- start-yarn.sh
- mapred --daemon start historyserver
View the operation of YARN
- http://node1:8088
6. First experience with MapReduce & YARN
6.1 Cluster start and stop commands
- start up
- start-yarn.sh
- mapred --daemon start historyserver
- stop
- stop-yarn.sh
- mapred --daemon stop historyserver
6.2 Submit MapReduce tasks to YARN for execution
slightly