Construction of distributed cluster environment of big data platform Hadoop

1 Overview

This article introduces the distributed environment construction of the big data platform Hadoop. The following is the deployment diagram of Hadoop nodes. The NameNode is deployed on master1, the SecondaryNameNode is deployed on master2, and a DataNode node is deployed in slave1, slave2, and slave3 respectively.

NN=NameNode

SND=SecondaryNameNode (Secondary Node of NameNode)

DN=DataNode (data node)

2 Preliminary preparation

(1) Prepare five servers

如:master1、master2、slave1、slave2、slave3

(2) Turn off the firewall of all servers

$ systemctl stop firewalld
$ systemctl disable firewalld

(3) Modify the /etc/hosts file of each server respectively, the contents are as follows:

192.168.56.132 master1
192.168.56.133 master2
192.168.56.134 slave1
192.168.56.135 slave2
192.168.56.136 slave3

Note: Correspondingly modify the /etc/hostname file of each server, namely master1, master2, slave1, slave2, slave3

(4) Create a common user and group on each server respectively

$ groupadd hadoop #增加新用户组
$ useradd hadoop -m -g hadoop #增加新用户
$ passwd hadoop #修改hadoop用户的密码

Switch to hadoop user: su hadoop

(5) The password-free login configuration between each server is executed once in each service

$ ssh-keygen -t rsa #一直按回车,会生成公私钥
$ ssh-copy-id hadoop@master1 #拷贝公钥到master1服务器
$ ssh-copy-id hadoop@master2 #拷贝公钥到master2服务器
$ ssh-copy-id hadoop@slave1 #拷贝公钥到slave1服务器
$ ssh-copy-id hadoop@slave2 #拷贝公钥到slave2服务器
$ ssh-copy-id hadoop@slave3 #拷贝公钥到slave3服务器

Note: The above operations need to log in to hadoop user operation

(6) Download hadoop package, hadoop-2.7.5.tar.gz

Official website address: https://archive.apache.org/dist/hadoop/common/hadoop-2.7.5/

3 Start the installation and deployment

(1) Create hadoop installation directory

$ mkdir -p /home/hadoop/app/hadoop/{tmp,hdfs/{data,name}}

(2) Unzip the installation package to /home/hadoop/app/hadoop

$tar zxf tar -zxf hadoop-2.7.5.tar.gz -C /home/hadoop/app/hadoop

(3) Configure the environment variables of hadoop and modify /etc/profile

JAVA_HOME=/usr/java/jdk1.8.0_131
JRE_HOME=/usr/java/jdk1.8.0_131/jre
HADOOP_HOME=/home/hadoop/app/hadoop/hadoop-2.7.5
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH

(4) Refresh environment variables

$source /etc/profile

4 Configure Hadoop

(1) Placement core-site.xml

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/core-site.xml
<configuration>
    <property>
	    <!-- 配置HDFS的NameNode所在节点服务器 -->
        <name>fs.defaultFS</name>
        <value>hdfs://master1:9000</value>
    </property>

    <property>
	    <!-- 配置Hadoop的临时目录 -->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/app/hadoop/tmp</value>
    </property>
</configuration>

Default configuration address: http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-common/core-default.xml

(2) Configure hdfs-site.xml

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
	    <!-- 配置HDFS的DataNode的备份数量 -->
        <name>dfs.replication</name>
        <value>3</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/hadoop/app/hadoop/hdfs/name</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/hadoop/app/hadoop/hdfs/data</value>
    </property>
   
    <property>
        <!-- 配置HDFS的权限控制 -->
	    <name>dfs.permissions.enabled</name>
	    <value>false</value>
    </property>

    <property>
        <!-- 配置SecondaryNameNode的节点地址 -->
        <name>dfs.namenode.secondary.http-address</name>
        <value>master2:50090</value>
    </property>
</configuration>

Default configuration address: http://hadoop.apache.org/docs/r2.7.5/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

(3) Placement mapred-site.xml

$ cp /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/mapred-site.xml.template /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/mapred-site.xml
$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/mapred-site.xml
<configuration>
    <property>
	    <!-- 配置MR运行的环境 -->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

Default configuration address: http://hadoop.apache.org/docs/r2.7.5/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

(4) Placement yarn-site.xml

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    
    <property>
	    <!-- 配置ResourceManager的服务节点 -->
        <name>yarn.resourcemanager.hostname</name>
        <value>master1</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master1:8032</value>
    </property>
    
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master1:8088</value>
    </property>
</configuration>

Default configuration address: http://hadoop.apache.org/docs/r2.7.5/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

(5) Configure slaves

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/slaves
slave1
slave2
slave3

The slaves file is configured with the node service of the DataNode.

(6) Configure hadoop-env

Modify the JAVA_HOME environment variable of the hadoop-env.sh file as follows:

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_131

(7) Configure yarn-env

Modify the JAVA_HOME environment variable of the yarn-env.sh file as follows:

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/yarn-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_131

(8) Configure mapred-env

Modify the JAVA_HOME environment variable of the mapred-env.sh file as follows:

$ vi /home/hadoop/app/hadoop/hadoop-2.7.5/etc/hadoop/mapred-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_131

(9) Remotely copy the hadoop configured in master1 to master2, slave1, slave2, and slave3 servers respectively

$ scp -r /home/hadoop/app/hadoop hadoop@master2:/home/hadoop/app/
$ scp -r /home/hadoop/app/hadoop hadoop@slave1:/home/hadoop/app/
$ scp -r /home/hadoop/app/hadoop hadoop@slave2:/home/hadoop/app/
$ scp -r /home/hadoop/app/hadoop hadoop@slave3:/home/hadoop/app/

5 Start the test

(1) Initialize the Hadoop cluster in the master1 node

$ hadoop namenode -format

(2) Start the Hadoop cluster

$ start-dfs.sh
$ start-yarn.sh

(3) Verify that the cluster is successful

Access the port 50070 in the browser, the following proves that the cluster deployment is successful

 

 

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324852707&siteId=291194637