Hadoop cluster configuration file and function corresponding analysis

Take a cluster of three nodes as an example:

To summarize:

nodemanager ,datanode  --> slaves
resourcemanager ---------->    yarn
namenode      --------------->    core-site

Detailed analysis:

Host Name Remarks IP Address Function
hadoop01 Master 192.168.211.134 NameNode, DataNode, NodeManager, ResourceManager
hadoop02 Slave 192.168.211.129 SecondaryNameNode, DataNode, NodeManager,
hadoop03 Slave 192.168.211.140 DataNode, NodeManager
All machines need to be configured
1.JDK 2.SSH login-free 3. Hadoop cluster

Gateway address; 192.168.211.1

 


One: NameNode (core-site.xml, which one is equipped here, which one will start the namenode), the corresponding value value
under dfs.http.address in hdfs-site.xml should also be written to this one and
Add 50070 port after, such as hadoop01:50070

<name>fs.defaultFS</name>
<!--Configure the address of the hdfs system-->
<value>hdfs://hadoop01:8020</value> (Which one is configured, the namenode will be started on which one )
Two:
ResourceManager (yarn-site.xml, which one to match here, which one to start ResourceManager) corresponds to two.

<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>

Three:
DataNode, NodeManager depends on:
slaves file. (Default localhost, just delete it)
Who runs the dataNode, who writes the slaves file.

When the namenode runs, it will start scanning the slaves file through the configuration file, who has the slaves file, who will start the dataNode.
When yarn is started, it will start scanning the slaves file by scanning the configuration file, who has the slaves file, who will start the NodeManager

Four:
SecondaryNameNode (hdfs-site.xml) whoever writes it under the secondary address is the secondary namenode.
<name>dfs.secondary.http.address</name>
<value>hadoop:50090</value>


Five: Startup process
[hadoop@hadoop01 hadoop]start-dfs.sh Start dfs, scan the core-site.xml file to start the namenode,
scan the slaves file after the namenode is started, and
execute ./hadoop-daemon.sh start datanode in the sbin directory Start datanode. (every machine does this)


sbin directory: cd /home/hadoop/hadoop-2.6.1/sbin/

[hadoop@hadoop01 sbin]./yarn-daemon.sh start resourcemanager (resourcemanager uses the yarn-daemon.sh script to start,
scan the slaves file, start the resourcemanager, after the startup is complete, scan the slaves file, in the sbin directory, execute ./ yarn-daemon.sh start nodemanager to start NodeManager)

Six: The secret-free process
sends the key: depends on who. Which hosts are the keys used to pass between?
In the process of configuring the distributed installation of hadoop, you need to configure ssh passwordless login.
When setting up a hadoop cluster, multiple physical machines are required to communicate (send or read data,
between namenode and datanode) through ssh. It
is impractical for operators to frequently enter passwords during the communication process, so it is necessary to Passwordless login for ssh.
Summary: Set up ssh password-free login on which function is nameNode. Send it to other datanode hosts, and
if you have it, send it to yourself.
The relationship between ResourceManager and NodeManager is the same.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324969585&siteId=291194637