Introduction
If the previous stand-alone hadoop environment installation is not enough for you, the clustered version of hadoop will definitely suit your appetite and start easily.
table of Contents
- Cluster planning
- Precondition
-
Configure password-free login
3.1 Generate Key
3.2 Password-free login
3.3 Verify password-free login
-
Cluster construction
4.1 Download and unzip
4.2 Configure environment variables
4.4 Modify configuration
4.4 Distribution program
4.5 Initialization
4.6 Start the cluster
4.7 View cluster
- Submit service to the cluster
1. Cluster Planning
Here is a three-node Hadoop cluster,
where three hosts are deployed with DataNode and NodeManager services,
but only the NameNode and ResourceManager services are deployed on hadoop001.
2. Precondition
The operation of Hadoop depends on JDK, which needs to be installed in advance. The installation steps are separately organized to:
2.1 Download and unzip
Download the required version of JDK 1.8 on the official website, and unzip it after downloading:
[root@ java]# tar -zxvf jdk-8u201-linux-x64.tar.gz
2.2 Set environment variables
[root@ java]# vi /etc/profile
Add the following configuration:
export JAVA_HOME=/usr/java/jdk1.8.0_201
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
Execute the source command to make the configuration take effect immediately:
[root@ java]# source /etc/profile
2.3 Check whether the installation is successful
[root@ java]# java -version
If the corresponding version information is displayed, the installation is successful.
java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)
3. Configure password-free login
3.1 Generate Key
Use the ssh-keygen command on each host to generate a public key private key pair:
3.2 Password-free login
Write the public key of hadoop001 to the ~/.ssh/authorized_key file of the local machine and the remote machine:
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop003
3.3 Verify password-free login
ssh hadoop002
ssh hadoop003
4. Cluster construction
4.1 Download and unzip
Download Hadoop. Here I downloaded the CDH version of Hadoop,
下载地址为:
http://archive.cloudera.com/cdh5/cdh/5/
# tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz
4.2 Configure environment variables
Edit the profile file:
Add the following configuration:
export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
export PATH=${HADOOP_HOME}/bin:$PATH
Execute the source command to make the configuration take effect immediately:
4.3 Modify configuration
Enter ${HADOOP_HOME}/etc/hadoop directory and modify the configuration file. The contents of each configuration file are as follows:
- hadoop-env.sh
# 指定JDK的安装位置 export JAVA_HOME=/usr/java/jdk1.8.0_201/
- core-site.xml
<configuration> <property> <!--指定 namenode 的 hdfs 协议文件系统的通信地址--> <name>fs.defaultFS</name> <value>hdfs://hadoop001:8020</value> </property> <property> <!--指定 hadoop 集群存储临时文件的目录--> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
- hdfs-site.xml
<property> <!--namenode 节点数据(即元数据)的存放位置,可以指定多个目录实现容错,多个目录用逗号分隔--> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/namenode/data</value> </property> <property> <!--datanode 节点数据(即数据块)的存放位置--> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/datanode/data</value> </property>
- yarn-site.xml
<configuration> <property> <!--配置 NodeManager 上运行的附属服务。需要配置成 mapreduce_shuffle 后才可以在 Yarn 上运行 MapReduce 程序。--> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <!--resourcemanager 的主机名--> <name>yarn.resourcemanager.hostname</name> <value>hadoop001</value> </property> </configuration>
- mapred-site.xml
<configuration> <property> <!--指定 mapreduce 作业运行在 yarn 上--> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
- slaves
configures the host names or IP addresses of all slave nodes, one per line. The DataNode service and NodeManager service on all slave nodes will be started.hadoop001 hadoop002 hadoop003
4.4 Distribution program
Distribute the Hadoop installation package to the other two servers. After distribution, it is recommended to configure Hadoop environment variables on the two servers.
# 将安装包分发到hadoop002
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop002:/usr/app/
# 将安装包分发到hadoop003
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/ hadoop003:/usr/app/
4.5 Initialization
Execute the namenode initialization command on Hadoop001:
hdfs namenode -format
4.6 Start the cluster
Go to the ${HADOOP_HOME}/sbin directory of Hadoop001 and start Hadoop. At this time, related services on hadoop002 and hadoop003 will also be started:
# 启动dfs服务
start-dfs.sh
# 启动yarn服务
start-yarn.sh
4.7 View cluster
Use the jps command on each server to view the service process, or directly enter the Web-UI interface to view, the port is 50070. You can see that there are three available Datanodes at this time:
Click Live Nodes to enter, you can see the details of each DataNode:
Then you can check the Yarn status, the port number is 8088:
5. Submit service to the cluster
The way of submitting jobs to the cluster is exactly the same as the stand-alone environment. Here is an example of submitting the built-in Hadoop Pi calculation program. It can be executed on any node. The command is as follows:
hadoop jar /usr/app/hadoop-2.6.0-cdh5.15.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.15.2.jar pi 3 3
More dry goods attention: data is great