Hadoop distributed physical machine cluster construction (hdfs and yarn) version: hadoop-3.3.0

Preparation:
Server configuration (6 units), using 1+1+4 cluster configuration, the specific information is as follows:
168.61.1.11 (used as NameNode and ResourceManager), 168.61.1.12, 168.61.1.13, 168.61.1.14, 168.61.1.15 , 168.61.1.16 (used as DataNode and NodeManager)).
Note: HDFS HA ​​QJM will be installed in the future. Journal Node is planned to be installed on 11, 12, and 13. Node 12 will also serve as the NameNode

The general steps are as follows: The
following configurations are only operated on the master first, and then use SCP to copy to other servers.
1) Download the latest hadoop image: hadoop-3.3.0, unzip it to the corresponding directory;
2) Configure environment variables in /etc/hosts, /etc/profile, hadoop-en.sh, yarn-en.sh, mapreduce-env. sh;
3) Set up password-free login;
4) Configure the corresponding files: workers, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml;
5) Start the cluster;

1. Modify /etc/hosts

168.61.1.11 hadoop01
168.61.1.12 hadoop02
168.61.1.13 hadoop03
168.61.1.14 hadoop04
168.61.1.15 hadoop05
168.61.1.16 hadoop06

2. Install jdk8 (omitted)

3. Modify /etc/profile

JAVA_HOME=/usr/local/tools/jdk1.8.0_231
CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Hadoop is the daemon thread cannot read the JAVA_HOME path configured in /etc/profile,
so configure JAVA_HOME in hadoop-env.sh, yarn-env.sh, mapred-env.sh, so that JAVA_HOME can be read

# variable is REQUIRED on ALL platforms except OS X!
export JAVA_HOME=/usr/local/tools/jdk1.8.0_231

Transfer /etc/profile, hadoop-env.sh, yarn-env.sh, and mapred-env.sh on master1 to other servers, and
then log in to each server: source /etc/profile, ./hadoop- env.sh, ./yarn-env.sh, ./mapred-env.sh.

4. SSH mutual trust of six machines

ssh generates the corresponding key pair: id_rsa private key and id_rsa.pub public key

ssh-keygen -t rsa -P ''

Note: Use the ssh-keygen -t rsa -P'' -f ~/.ssh/id_rsa command to avoid interactive operations.

The default is to exist under /current user/.ssh (/root/.ssh or /home/user/.ssh)!
Copy the public key to a specific file authorized_keys on the master node,

cat id_rsa.pub >> authorized_keys

Try whether the local password-free login setting is successful:

ssh localhost  #第一次登陆需要确定,输入yes

If there is no problem, then configure other servers, in fact, you only need to copy the id_rsa.pub of the local master1 to other servers! Here, choose the ssh-copy-id command to transmit to other servers.

# ssh-copy-id appadmin@hadoop02
# ssh-copy-id appadmin@hadoop03
# ssh-copy-id appadmin@hadoop04
# ssh-copy-id appadmin@hadoop05
# ssh-copy-id appadmin@hadoop06

5. Turn off the firewall or add the IP addresses of all six machines to the whitelist

[root@hadoop01 opt]# systemctl stop firewalld
[root@hadoop01 opt]# systemctl disable firewalld

6. Configure hadoop environment

Mainly modify 4 configuration files: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml

Placement core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop01:9000</value>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/hadoop-3.3.0/tmp</value>
        </property>
</configuration>

Configure hdfs-site.xml

<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file://${hadoop.tmp.dir}/dfs/nn,file://${hadoop.tmp.dir}/dfs/nn2</value>
        </property>
        <property>
                <name>dfs.namenode.edits.dir</name>
                <value>file://${hadoop.tmp.dir}/dfs/edits</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file://${hadoop.tmp.dir}/dfs/dn</value>
        </property>
        <property>
                <name>dfs.blocksize</name>
                <value>268435456</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
</configuration>

Configure yarn-site.xml, note that the configurations of several machines are different.
The following is the configuration information on ResourceManager

<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>40960</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <property>
                <name>yarn.nodemanager.remote-app-log-dir</name>
                <value>{hadoop.tmp.dir}/tmp/logs</value>
        </property>
        <property>
                <name>yarn.nodemanager.log-dirs</name>
                <value>/opt/hadoop-3.3.0/tmp/yarndata/logs</value>
        </property>
        <property>
                <name>yarn.nodemanager.local-dirs</name>
                <value>${hadoop.tmp.dir}/nm-local-dir</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <property>
                <name>yarn.scheduler.maximum-allocation-vcores</name>
                <value>16</value>
                <description>提交一个任务允许申请的最大cpu核数</description>
        </property>
        <property>
                <name>yarn.scheduler.maximum-allocation-mb</name>
                <value>40960</value>
                <description>提交一个任务允许申请的最大内存(注意结合其他node节点的yarn.nodemanager.resource.memory-mb ,不要超过它)
                </description>
        </property>
</configuration>

The following is the configuration information on NodeManager: (After configuring on one machine, copy it to other NodeManager machines)

<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>40960</value>
                <description> 
                    节点使用的最大内存(结合host-155的yarn.scheduler.maximum-allocation-mb)
                </description>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.log-aggregation-enable</name>
                <value>true</value>
        </property>
        <property>
                <name>yarn.nodemanager.remote-app-log-dir</name>
                <value>{hadoop.tmp.dir}/tmp/logs</value>
        </property>
        <property>
                <name>yarn.nodemanager.log-dirs</name>
                <value>/opt/hadoop-3.3.0/tmp/yarndata/logs</value>
        </property>
        <property>
                <name>yarn.nodemanager.local-dirs</name>
                <value>${hadoop.tmp.dir}/nm-local-dir</value>
        </property>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop01</value>
        </property>
        <property>
                <name>yarn.nodemanager.hostname</name>
                <value>hadoop06</value>--#其它机器注意修改这个名字
        </property>
        <property>
                <name>yarn.nodemanager.resource.cpu-vcores</name>
                <value>16</value>
                <description>节点使用的cpu核数</description>
        </property>
</configuration>

Placement mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hadoop01:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>hadoop01:19888</value>
        </property>
</configuration>

Configure workers (configured on each datanode)

hadoop02
hadoop03
hadoop04
hadoop05
hadoop06

Modify /etc/profile

JAVA_HOME=/usr/local/tools/jdk1.8.0_231
CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
HADOOP_HOME=/opt/hadoop-3.3.0
PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

export CLASSPATH
export PATH

Modify ~/.bash_profile (this part of the script can also be merged into the profile)

JAVA_HOME=/usr/local/tools/jdk1.8.0_231
CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
HADOOP_HOME=/opt/hadoop-3.3.0
PATH=.:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

export CLASSPATH
export PATH

7, start the cluster

1) Format the namenode:

hdfs namenode -format

2) Start the hdfs cluster:

/opt/hadoop-3.3.0/sbin/start-dfs.sh

3) Start the yarn cluster:

/opt/hadoop-3.3.0/sbin/start-yarn.sh

4) View the startup result:
On hadoop01, jps can see:

[appadmin@168-61-1-11 sbin]$ jps
16610 Jps
28931 NameNode
30503 ResourceManager

On other nodes, jps can see:

[appadmin@168-61-1-16 hadoop]$ jps
17556 DataNode
2983 Jps
18379 NodeManager

5) Access the management node:

HDFS WEB management: http://168.61.1.11:9870/
Insert picture description here

Yarn WEB management: http://168.61.1.11:8088/
Insert picture description here

Guess you like

Origin blog.csdn.net/sinat_39809957/article/details/112990935