Build Hadoop-2.8.4 Distributed Cluster Centos7

Since the machine reasons, only got three machines, so I have to take three machines, a machine to do Namenode, other 2 do DataNode.

Management node and the data node is preferably deployed separately Hadoop

Data node running the task in the actual load is relatively heavy, easily affect the stability of exemplary management

The following is to build the steps of:

1 modify / etc / hosts file

 Here some explanation under his own hostname been modified, changed what master, the machine itself hostname salver 1, salver2 or form such namenode, datanode1, datanode2 this, but bloggers because they can not modify the hostname, so the direct use, where As long as the good relationship between the ip and hostname mapping, are no problem. So do not be too tangled this problem.

 

The three machines are configured the same configuration, it is roughly as follows, according to the actual situation with: I'll be back for example to three names

192.168.9.1 namenode
192.168.9.2  datanode1
192.168.9.3 datanode2

Three machines were modified after 2 checks whether all Ping the interconnected between the three machines , this command is as follows

ping -c 3 datanode1

Such tests should be carried out three machines above, whether it can pass between each test and are in addition a few machines 

3 free secret log on , log on as it will further follow-up Hadoop cluster in time to start, so here will do Free secret sign-on between the three machines under 

First generate the public key on three machines 

ssh-keygen -t rsa -P ''

4 authorized_keys files were created on three machines

touch /root/.ssh/authorized_keys

5. Check authorized_keys file successfully generated

ls /root/.ssh/

6 The three machines id_rsa.pub copy of the key, and then put all three key authorized_keys, and then the contents of the three machines authorized_keys file consistency

As the effect of substantially

7 Check whether the free secret landing

ssh datanode1 

They were tested on three interconnected machine 

8 JDK environmental inspections

JDK environment were examined on three machines, JDK version I use is jdk1.8.0_65

If the JDK need to upgrade the steps outlined below:

Download the JDK installation package  

In / opt / java create java directory

The installation package is then placed in the directory, to decompress

 mkdir /opt/java

 tar  -zxvf   jdk-8u65-linux-x64.tar.gz

Modify the configuration file

vim /etc/profile

Add the following command in the file:

export JAVA_HOME=/opt/java/jdk1.8.0_65

export  CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/

export  PATH=$PATH:$JAVA_HOME/bin

Execute commands to validate the configuration

source /etc/profile

Test whether the installation is successful

Java -version

If you still are not successful previous versions, the solution is as follows:

First check under

which java

which javac

Then modify the corresponding flexible connection

rm -rf /usr/bin/java

rm -rf /usr/bin/javac

ln -s $JAVA_HOME/bin/javac /usr/bin/javac

ln -s $JAVA_HOME/bin/javac /usr/bin/java

After the implementation continue to view

Java -version   

Javac -version

9 Create a folder to download hadoop hadoop Mirror

mkdir /opt/hadoop

10 then proceeds to extract the installation folder hadoop Package

 tar -xvf hadoop-2.8.4.tar.gz

11 Create a folder is created on the three machines

mkdir  /root/hadoop

mkdir  /root/hadoop/tmp

mkdir  /root/hadoop/var

mkdir  /root/hadoop/dfs

mkdir  /root/hadoop/dfs/name

mkdir  /root/hadoop/dfs/data

The following steps should be operating three machines, consistency is the same file, you can get only one, behind several machines directly copied in the past.

12 modified core-site.xml file

Add disposed between the <configuration> </ configuration> node

vim /opt/hadoop/hadoop-2.8.0/etc/hadoop/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
 <value>/root/hadoop/tmp</value>
 <description>Abase for other temporary directories.</description>
</property>
<property>
 <name>fs.default.name</name>
 <value>hdfs://namenode:9000</value>
</property>

13 hadoop-env.sh modify the file, this file is configured to run hadoop environment jave

vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/hadoop-env.sh
将export   JAVA_HOME=${JAVA_HOME}
改为
export   JAVA_HOME=/opt/java/jdk1.8.0_65
自己的JDK路径

14  modified hdfs-site.xml file, here is the configuration storage path

Add disposed between the <configuration> </ configuration> node

vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>true</value>
<description>need not permissions</description>
</property>
<property>
 <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
 <value>false</value>

Description: dfs.permissions configured as false, the permit can not check the permissions on the files generated on the dfs, convenient touches easily, but you need to prevent accidental deletion, set it to true, or directly delete the property node, because The default is true.

15  New and modified mapred-site.xml file

cp /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml.template /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml
vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/mapred-site.xml

Modify this new mapred-site.xml file, added disposed between the <configuration> </ configuration> node:

<property>
<name>mapred.job.tracker</name>
<value>namenode:49001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/root/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

16  modify slaves file  note here is stored in the machine's hostname datanode

vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/slaves

The inside localhost delete, add the following:

datanode1

datanode2

17  modified yarn-site.xml file, this is mainly about the yarn configuration services

Add disposed between the <configuration> </ configuration> node (note that, the memory configuration in accordance with the machine the better, I am here with only two because the machine is not G) ::

vim /opt/hadoop/hadoop-2.8.4/etc/hadoop/yarn-site.xml
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<discription>8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

18   performs an initialization operation in the machine namenode

cd   /opt/hadoop/hadoop-2.8.0/bin

19  perform the initialization script, that is, execute the command:

 ./hadoop  namenode  -format

20  startup command is executed on the machine namenode

Enter the directory:

cd /opt/hadoop/hadoop-2.8.4/sbin/

Start command:

 ./start-all.sh

21 test hadoop, turn off the firewall

 systemctl   stop   firewalld.service

22  See if hadoop cluster starts up  

Enter the following command:

jps

23  View hadoop -overview

Access namenode machine IP: 50070, bloggers here have done a port mapping, real access port is 50070

The page will automatically jump

You can then access ip: 8088 View datanode port cluster

​​​​​​​

 

Guess you like

Origin blog.csdn.net/qq_27575895/article/details/90054967
Recommended