This article introduces the construction of hadoop3.2.4 cluster environment. Before reading this article, it is best to read the pseudo-distributed
construction article link as follows, because some problems are encountered when pseudo-distributed, and the solutions will not be repeated here.
Link: Pseudo-distributed construction
Article Directory
foreword
In actual use, the construction of hadoop must be a cluster deployment method, so the cluster deployment method is built here, and I am also familiar with the cluster construction of hadoop.
1. Prepare the machine
For this build, three virtual machines were prepared, namely
hadoop1 192.168.184.129
hadoop2 192.168.184.130
Hadoop3 192.168.184.131
The three virtual machines must be able to ping each other. I am using the nat network configuration for the virtual machine here. You can see I have another article on how to configure it. The deployment plan of the nat virtual network configuration node is as follows
Hadoop1 | hadoop2 | hadoop3 | |
---|---|---|---|
hdfs | NameNode DataNode | SecondaryNameNode DataNode | DataNode |
yarn | NodeManager | NodeManager ResourceManager | NodeManager |
Two, linux environment preparation
The following operations need to be performed on all three machines,
2.1 Modify the host name
Vi /etc/hostname```
Modify the host name to add ip mapping host name for hadoop1, hadoop2, hadoop3 respectively
Vi /etc/hosts
192.168.184.129 hadoop1
192.168.184.130 hadoop2
192.168.184.131 hadoop3
2.2 Stop and disable the firewall
systemctl stop firewalld.service
systemctl disable firewalld.service
2.3 Configure password-free login between machines
The principle diagram of password-free login is as follows,
2.3.1 Generate public key and private key
ssh-keygen -t rsa
2.3.2 Copy the public key to the machine that needs password-free login
Then enter the cd .ssh directory
and you can see that there are two files
cd .ssh
They are the private key and the public key respectively, and copy the public key to hadoop2 and hadoop3
for execution
ssh-copy-id hadoop2
ssh-copy-id hadoop3
2.3.3 Test password-free login
ssh hadoop2
ssh hadoop3
If you do not need to enter a password, the modification is successful. Similarly, you can set password-free login to the other two machines on hadoop2 and hadoop3. Not shown here, the same operation.
3. Hadoop configuration file modification
Modify the following files in the etc/hadoop directory of the installation directory.
3.1 Modify core-site.xml
<!-- 设置默认使用的文件系统 Hadoop支持file、HDFS、GFS、ali|Amazon云等文件系统 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.184.129:8020</value>
</property>
<!-- 设置Hadoop本地保存数据路径 注意这个目录不存在会导致启动不起来-->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/tools/hadoop-3.2.4/data</value>
</property>
<!-- 设置HDFS web UI用户身份 -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
<!-- 整合hive 用户代理设置 -->
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<!-- 文件系统垃圾桶保存时间 -->
<property>
<name>fs.trash.interval</name>
<value>1440</value>
</property>
3.2 Modify the hdfs-site.xml file
<!-- 设置SecondNameNode进程运行机器位置信息 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.184.130:9868</value>
</property>
3.3 Modify the mapred-site.xml file
<!-- 设置MR程序默认运行模式: yarn集群模式 local本地模式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- MR程序历史服务地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.184.129:10020</value>
</property>
<!-- MR程序历史服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.184.129:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${
HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${
HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${
HADOOP_HOME}</value>
</property>
3.4 Modify the yarn-site.xml file
<!-- 设置YARN集群主角色运行机器位置 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.184.130</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 是否将对容器实施物理内存限制 -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!-- 是否将对容器实施虚拟内存限制。 -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- 开启日志聚集 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置yarn历史服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://192.168.184.129:19888/jobhistory/logs</value>
</property>
<!-- 历史日志保存的时间 7天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
3.5 Modify the workers file
192.168.184.129
192.168.184.130
192.168.184.131
3.6 Copy configuration files to other machines
scp -r /root/tools/hadoop-3.2.4/etc/hadoop root@hadoop2:/root/tools/hadoop-3.2.4/etc/hadoop
scp -r /root/tools/hadoop-3.2.4/etc/hadoop root@hadoop3:/root/tools/hadoop-3.2.4/etc/hadoop
4. Start the cluster
注意如果是初次执行启动,需要在每台机器上执行初始化操作
hdfs namenode –format
4.1 Start hdfs on hadoop1
The installation directory mine is /root/tools/hadoop-3.2.4 and execute the following operations.
./sbin/start-dfs.sh
Start dfs and report the error as follows
dfs/name is in an inconsistent state: storage directory does not exist or is not accessible
Solution: reformat namenode
hdfs namenode –format
This is not my problem, it is that there is one more space in front of the path and the space can be removed.
After successful startup, visit the namenode management page
http://192.168.184.129:9870 and you can see that all three datanodes are up
4.2 Start yarn on hadoop2
./sbin/start-yarn.sh
Visit the resourcemanager management page
http://192.168.184.130:8088
5. Run the test demo
5.1 Calculate pi demo
Go to the share/mapreduce/directory table and execute
hadoop jar hadoop-mapreduce-examples-3.2.4.jar pi 2 4
The result is shown in the figure
6. Startup script
It's a bit troublesome to start. I wrote a script that only needs to be executed once to start and shut down the cluster.
hadoop.sh, you need to modify the execution permission after modifying the file,
chmod 777 hadoop.sh
#!/bin/bash
# 判断参数个数
if [ $# -ne 1 ];then
echo "need one param, but given $#"
fi
# 操作hadoop
case $1 in
"start")
echo " ========== 启动hadoop集群 ========== "
echo ' ---------- 启动 hdfs ---------- '
ssh hadoop1 "/root/tools/hadoop-3.2.4/sbin/start-dfs.sh"
echo ' ---------- 启动 yarn ---------- '
ssh hadoop2 "/root/tools/hadoop-3.2.4/sbin/start-yarn.sh"
;;
"stop")
echo " ========== 关闭hadoop集群 ========== "
echo ' ---------- 关闭 yarn ---------- '
ssh hadoop2 "/root/tools/hadoop-3.2.4/sbin/stop-yarn.sh"
echo ' ---------- 关闭 hdfs ---------- '
ssh hadoop1 "/root/tools/hadoop-3.2.4/sbin/stop-dfs.sh"
;;
*)
echo "Input Param Error ..."
;;
esac
Start the cluster, shut down the cluster
./hadoop.sh start
./hadoop.sh stop
7. Key configuration instructions
The key configuration of the node is,
7.1 in yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.184.130</value>
</property>
7.2 hdfs-site.xml, determine the location of the secodarynamenode program
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.184.130:9868</value>
</property>
7.3 Make sure that the node started by yarn is hadoop2. core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.184.129:8020</value>
</property>
Determine the location of the namenode, that is, the hdfs startup node is hadoop1
7.4 configuration of workers file,
It is to make sure that all nodes of datanode and nodemanage are running.
8. Summary
Here I don’t create a new user to run the hadoop program. Strictly speaking, I can’t run the hadoop program directly with root. I’m too lazy to do it here, so I just run it with root. The previous article said how to run it with root. You can read the previous article. article. This shows the cluster building process from beginning to end and also records the problems encountered. If it is helpful to you, please like it.