Dark Horse Programmer-Introduction to Big Data to Practical Combat-Introduction to MapReduce & YARN

1. Overview of Distributed Computing

  1. Computing and Distributed Computing
  • Calculation: process the data, use statistical analysis and other means to obtain the required results
  • Distributed computing: Multiple servers work together to complete a computing task
  1. Two working modes of distributed computing
  • Scatter→Aggregate (MapReduce)
  • Central scheduling → step execution (Spark, Flink)

2. Overview of MapReduce

  1. MapReduce
  • Distributed Computing Components in Hadoop
  • Scatter → Aggregate Mode
  1. main interface
  • map interface: "scatter" function
  • reduce interface: "aggregate" functions
  1. operating mechanism
  • Decompose the execution requirements into multiple Map Task and Reduce Task
  • Assign Map Task and Reduce Task to corresponding servers for execution

3. Overview of YARN

  1. YARN
  • Hadoop a component
  • Resource Scheduling for Clusters
  1. Relationship between MapReduce and YARN
  • YARN is used to schedule resources to allocate and manage running resources for MapReduce
  • MapReduce requires YARN to execute

4. YARN Architecture

4.1 Core Architecture

  1. Core Architecture Roles
  • Main: ResourceManager
  • From: NodeManager
  1. Function
  • ResourceManager: Manage, coordinate and allocate resources for the entire cluster
  • NodeManager: Manage and allocate resources of a single server, that is, create a management container, and the container provides resources for program use

4.2 Auxiliary Architecture

  • ProxyServer: Ensuring the security of web UI access
  • JobHistoryServer: Record historical program running information and logs

5. Deployment of MapReduce & YARN

5.1 Cluster Planning

  • node1:ResourceManager、NodeManager、ProxyServer、JobHistoryServer
  • node2:NodeManager
  • node3:NodeManager

5.2 MapReduce configuration file

  1. In the $HADOOP_HOME/etc/hadoop folder, modify:
  • mapred-env.sh file
    1
  • mapred-site.xml file
    2
    3
  • yarn-env.sh file
    4
  • yarn-site.xml file
    5
    6

5.3 Distribution configuration files

7

5.4 Introduction to cluster startup commands

  • One-click start YARN cluster: $HADOOP_HOME/sbin/start-yarn.sh
  • One-click stop YARN cluster:
    $HADOOP_HOME/sbin/stop-yarn.sh

5.5 Start the YARN cluster

On node1 server, execute as hadoop user

  • start-yarn.sh
  • mapred --daemon start historyserver

View the operation of YARN

  • http://node1:8088

6. First experience with MapReduce & YARN

6.1 Cluster start and stop commands

  1. start up
  • start-yarn.sh
  • mapred --daemon start historyserver
  1. stop
  • stop-yarn.sh
  • mapred --daemon stop historyserver

6.2 Submit MapReduce tasks to YARN for execution

slightly

Guess you like

Origin blog.csdn.net/m0_68111267/article/details/131736590