Big Data (5) --- Distributed Resource Scheduling tasks Yarn

Earlier also spoke of Yarn is hadoop system resource scheduling platform. Therefore, the entire package inside hadoop also has its nature. Here we are at a brief introduction, set up and configure a cluster yarn.

First of all Yarn has two core roles Resource Manager and Node Manager.

 

Resource Manager is responsible for receiving user submits a distributed computing program / task and divide its resources, management and monitoring of each the Node Manager .

 

Node Manager receives resoResource Manager assignments over, and computing.

 

 

 

Popular point that is a calculation procedure will be labeled jar package, and then allocated to each node manager go above, such that each manager node code execution is the same, but may not be the same as the data source.

 

 

 

 

Cluster configuration:

 

 

node manager physically be with data node deployed together, easy to read the data

 

 

 

Yarn software in hadoop are there, just inside hdfs , we just need to configure it, and then start it

 

Each machine on the etc / hadoop / yarn-site.xml configure

 

 

 

<property><!--配置redource manager-->

 

<name>yarn.resourcemanager.hostname</name>

 

<value>nijunyang68</value>

 

</property>

 

 

 

 Because before configuring hdfs cluster when the cluster IP will have been slaves are configured into it, so now only need a key to execute a script on it: start-yarn.sh

Note that start in which machine redource manager to execute the script in there, just above configuration tells the cluster machine who is redource manager, so the implementation of the above machines this script needs to execute in the configuration. Can be seen from the log, resource manager in the machine is started, node manager is activated in other machines above.

 

 

 The default 8088 port can view the page in the Web the Yarn cluster information

 

 

 

He said a little above the display memory size is wrong, because we are not configured, the default is used, not the actual value of my machine, I actually virtual machine a total of only 1G of memory

 

 

 

 

 

 

 

 

Configuration details: https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

 

<property>

 

<name>yarn.nodemanager.resource.memory-mb</name>

 

<value>1024</value>

 

</property>

 

There is a minimum memory allocation restrictions 1024 , otherwise the cluster can not be started.

 

Audit also not the actual CPU core number, my virtual machine has only one nuclear, here I mean if memory 200M , now has a task requires 100m memory, this machine so I can play two tasks, so you can the Audit configured to 2 , if configured to only play for so long a task. I mean that CPU though a nuclear, but I am a man 100M , I 200 memory can play two tasks, then my CPU computing power on average for these two tasks.

 

<property>

 

<name>yarn.nodemanager.resource.cpu-vcores</name>

 

<value>2</value>

 

</property>

 

 So far yarn cluster to set up is completed, the subsequent waiting tasks mapreduce throw up and running.

 

Guess you like

Origin www.cnblogs.com/nijunyang/p/12147635.html