hadoop3.2.4 cluster environment construction

This article introduces the construction of hadoop3.2.4 cluster environment. Before reading this article, it is best to read the pseudo-distributed
construction article link as follows, because some problems are encountered when pseudo-distributed, and the solutions will not be repeated here.
Link: Pseudo-distributed construction


foreword

In actual use, the construction of hadoop must be a cluster deployment method, so the cluster deployment method is built here, and I am also familiar with the cluster construction of hadoop.


1. Prepare the machine

For this build, three virtual machines were prepared, namely
hadoop1 192.168.184.129
hadoop2 192.168.184.130
Hadoop3 192.168.184.131
The three virtual machines must be able to ping each other. I am using the nat network configuration for the virtual machine here. You can see I have another article on how to configure it. The deployment plan of the nat virtual network configuration node is as follows

Hadoop1 hadoop2 hadoop3
hdfs NameNode DataNode SecondaryNameNode DataNode DataNode
yarn NodeManager NodeManager ResourceManager NodeManager

Two, linux environment preparation

The following operations need to be performed on all three machines,

2.1 Modify the host name

Vi  /etc/hostname```


Modify the host name to add ip mapping host name for hadoop1, hadoop2, hadoop3 respectively

Vi /etc/hosts
192.168.184.129 hadoop1
192.168.184.130 hadoop2
192.168.184.131 hadoop3

2.2 Stop and disable the firewall

systemctl stop firewalld.service
systemctl disable firewalld.service

2.3 Configure password-free login between machines

The principle diagram of password-free login is as follows,
insert image description here

2.3.1 Generate public key and private key

ssh-keygen -t rsa

2.3.2 Copy the public key to the machine that needs password-free login

Then enter the cd .ssh directory
and you can see that there are two files

cd .ssh

insert image description here
They are the private key and the public key respectively, and copy the public key to hadoop2 and hadoop3
for execution

ssh-copy-id hadoop2
ssh-copy-id hadoop3

2.3.3 Test password-free login

ssh hadoop2
ssh hadoop3

insert image description here

If you do not need to enter a password, the modification is successful. Similarly, you can set password-free login to the other two machines on hadoop2 and hadoop3. Not shown here, the same operation.

3. Hadoop configuration file modification

Modify the following files in the etc/hadoop directory of the installation directory.

3.1 Modify core-site.xml

<!-- 设置默认使用的文件系统 Hadoop支持file、HDFS、GFS、ali|Amazon云等文件系统 -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://192.168.184.129:8020</value>
</property>

<!-- 设置Hadoop本地保存数据路径 注意这个目录不存在会导致启动不起来-->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/root/tools/hadoop-3.2.4/data</value>
</property>

<!-- 设置HDFS web UI用户身份 -->
<property>
    <name>hadoop.http.staticuser.user</name>
    <value>root</value>
</property>

<!-- 整合hive 用户代理设置 -->
<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>

<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

<!-- 文件系统垃圾桶保存时间 -->
<property>
    <name>fs.trash.interval</name>
    <value>1440</value>
</property>

3.2 Modify the hdfs-site.xml file

<!-- 设置SecondNameNode进程运行机器位置信息 -->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>192.168.184.130:9868</value>
</property>

3.3 Modify the mapred-site.xml file

<!-- 设置MR程序默认运行模式: yarn集群模式 local本地模式 -->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

<!-- MR程序历史服务地址 -->
<property>
  <name>mapreduce.jobhistory.address</name>
  <value>192.168.184.129:10020</value>
</property>
 
<!-- MR程序历史服务器web端地址 -->
<property>
  <name>mapreduce.jobhistory.webapp.address</name>
  <value>192.168.184.129:19888</value>
</property>

<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${
    
    HADOOP_HOME}</value>
</property>

<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${
    
    HADOOP_HOME}</value>
</property>

<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${
    
    HADOOP_HOME}</value>
</property>

3.4 Modify the yarn-site.xml file

<!-- 设置YARN集群主角色运行机器位置 -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>192.168.184.130</value>
</property>

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

<!-- 是否将对容器实施物理内存限制 -->
<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 是否将对容器实施虚拟内存限制。 -->
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

<!-- 开启日志聚集 -->
<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>

<!-- 设置yarn历史服务器地址 -->
<property>
    <name>yarn.log.server.url</name>
    <value>http://192.168.184.129:19888/jobhistory/logs</value>
</property>

<!-- 历史日志保存的时间 7-->
<property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
</property>

3.5 Modify the workers file

192.168.184.129
192.168.184.130
192.168.184.131

3.6 Copy configuration files to other machines

scp -r /root/tools/hadoop-3.2.4/etc/hadoop root@hadoop2:/root/tools/hadoop-3.2.4/etc/hadoop
scp -r /root/tools/hadoop-3.2.4/etc/hadoop root@hadoop3:/root/tools/hadoop-3.2.4/etc/hadoop

4. Start the cluster

注意如果是初次执行启动,需要在每台机器上执行初始化操作

hdfs namenode –format

4.1 Start hdfs on hadoop1

The installation directory mine is /root/tools/hadoop-3.2.4 and execute the following operations.

./sbin/start-dfs.sh

Start dfs and report the error as follows

dfs/name is in an inconsistent state: storage directory does not exist or is not accessible

Solution: reformat namenode

hdfs namenode –format

This is not my problem, it is that there is one more space in front of the path and the space can be removed.
After successful startup, visit the namenode management page
http://192.168.184.129:9870 and you can see that all three datanodes are up
insert image description here

4.2 Start yarn on hadoop2

./sbin/start-yarn.sh

Visit the resourcemanager management page
http://192.168.184.130:8088

insert image description here

5. Run the test demo

5.1 Calculate pi demo

Go to the share/mapreduce/directory table and execute

hadoop jar hadoop-mapreduce-examples-3.2.4.jar pi 2 4

The result is shown in the figure
insert image description here

6. Startup script

It's a bit troublesome to start. I wrote a script that only needs to be executed once to start and shut down the cluster.
hadoop.sh, you need to modify the execution permission after modifying the file,

chmod 777 hadoop.sh
#!/bin/bash
# 判断参数个数
if [ $# -ne 1 ];then
 echo "need one param, but given $#"
fi
 
# 操作hadoop
case $1 in
"start")
	echo " ========== 启动hadoop集群 ========== "
	echo ' ---------- 启动 hdfs ---------- '
	ssh hadoop1 "/root/tools/hadoop-3.2.4/sbin/start-dfs.sh"
	echo ' ---------- 启动 yarn ---------- '
	ssh hadoop2 "/root/tools/hadoop-3.2.4/sbin/start-yarn.sh"
	;;
"stop")
	echo " ========== 关闭hadoop集群 ========== "
	echo ' ---------- 关闭 yarn ---------- '
	ssh hadoop2 "/root/tools/hadoop-3.2.4/sbin/stop-yarn.sh"
	echo ' ---------- 关闭 hdfs ---------- '
	ssh hadoop1 "/root/tools/hadoop-3.2.4/sbin/stop-dfs.sh"
	;;
*)
	echo "Input Param Error ..."
	;;
esac

Start the cluster, shut down the cluster

./hadoop.sh start
./hadoop.sh stop

7. Key configuration instructions

The key configuration of the node is,

7.1 in yarn-site.xml

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>192.168.184.130</value>
</property>

7.2 hdfs-site.xml, determine the location of the secodarynamenode program

<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>192.168.184.130:9868</value>
</property>

7.3 Make sure that the node started by yarn is hadoop2. core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://192.168.184.129:8020</value>
</property>

Determine the location of the namenode, that is, the hdfs startup node is hadoop1

7.4 configuration of workers file,

It is to make sure that all nodes of datanode and nodemanage are running.

8. Summary

Here I don’t create a new user to run the hadoop program. Strictly speaking, I can’t run the hadoop program directly with root. I’m too lazy to do it here, so I just run it with root. The previous article said how to run it with root. You can read the previous article. article. This shows the cluster building process from beginning to end and also records the problems encountered. If it is helpful to you, please like it.

Guess you like

Origin blog.csdn.net/qq_34526237/article/details/129934725