Through the above articles, we are familiar with the stand-alone operation in HDFS and MapReduce computation case, in actual use, in order to improve performance and stability, big data components will exist in a cluster, this article will explain how to build hadoop cluster.
1 systems, software and prerequisite constraints
- 7 CentOS
(1) have been configuring a free CentOS three secret login
https://www.jianshu.com/p/0cc72b228647
(2) three CentOS have been installed jdk, and configure the JAVA_HOME environment variable
https: //www.jianshu .com / the p-/ 826dc5eca7cb
(3) of the following three CentOS machine specific information, the reader is set according to the actual situation:
| 主机名 | ip | 账号/密码 |包含节点|
| master| 192.168.79.128 | root/zhangli | resourcemanager,namenode |
| slave1| 192.168.79.129 | root/zhangli | nodemanager,datanode |
| slave2| 192.168.79.130 | root/zhangli | nodemanager,datanode |
Execute the following commands as root in three CentOS machine, determining turn off the firewall
systemctl stop firewalld
- 2.5.2-hadoop
hadoop download link: https://pan.baidu.com/s/1c_skDYabCRSkS5hRUB6lFQ
extraction code: a00t
2 operation
2.1 hadoop-2.5.2.zip to upload and extract the master node, and then make the following changes:
(1) modified core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/root/hadoop-2.5.2/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
(2) Modify hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/hadoop-2.5.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/hadoop-2.5.2/dfs/data</value>
</property>
</configuration>
(3) modify mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
(4) modify the yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
(5) to modify the contents of slaves
Slave1
salve2
(. 6) are added /root/hadoop-2.5.2/etc/hadoop/hadoop-env.sh
export JAVA_HOME = / root / jdk1.8.0_162
in /root/hadoop-2.5 .2 / etc / hadoop / yarn-env.sh added
export JAVA_HOME = / root / jdk1.8.0_162
hadoop-2.5.2 2.2 were put in master copy to slave1, slave2, and format.
# 压缩已经配置好的hadoop-2.5.2
tar -cvf hadoop.tar hadoop-2.5.2
# 远程拷贝到slave1
scp hadoop.tar root@slave1
# 远程拷贝到slave2
scp hadoop.tar root@slave2
# 免密登录到slave1
ssh slave1
# 解压hadoop.tar
tar -xvf hadoop.tar
# 格式化namenode
/root/hadoop-2.5.2/bin/hdfs namenode -format
# 退出免密登录
exit
# 免密登录到slave2
ssh slave2
# 解压hadoop.tar
tar -xvf hadoop.tar
# 格式化namenode
/root/hadoop-2.5.2/bin/hdfs namenode -format
# 退出免密登录
exit
Are provided in the following slave1, slave2 in the hadoop:
Add /root/hadoop-2.5.2/etc/hadoop/hadoop-env.sh the
export JAVA_HOME = / root / jdk1.8.0_162
in /root/hadoop-2.5 .2 / etc / hadoop / yarn-env.sh added
export JAVA_HOME = / root / jdk1.8.0_162
2.3 hadoop start on the master
# 确认当前是在master主机,进入家目录
cd
# 进入sbin目录
cd hadoop-2.5.2/sbin
# 启动
./start-all.sh
2.4 Test
# 确认当前是master主机,进入家目录
cd
# 进入bin目录
cd hadoop-2.5.2/bin
# 上传yarn文件到hdfs
./hdfs dfs -put yarn /yarn
# 查看上传结果,如果正确则能看到/yarn
./hdfs dfs -ls /
# 免密登录到slave2
ssh slave2
# 查看上传结果,如果正确则能看到/yarn
./hdfs dfs -ls /
So far, we have completed the hadoop cluster installed and tested in three CentOS machine.