Cluster version of hadoop installation, written for busy people~

Introduction

If the previous stand-alone hadoop environment installation is not enough for you, the clustered version of hadoop will definitely suit your appetite and start easily.

table of Contents

  1. Cluster planning
  2. Precondition
  3. Configure password-free login

    3.1 Generate Key

    3.2 Password-free login

    3.3 Verify password-free login

  4. Cluster construction

    4.1 Download and unzip

    4.2 Configure environment variables

    4.4 Modify configuration

    4.4 Distribution program

    4.5 Initialization

    4.6 Start the cluster

    4.7 View cluster

  5. Submit service to the cluster

1. Cluster Planning

Here is a three-node Hadoop cluster,
where three hosts are deployed with DataNode and NodeManager services,
but only the NameNode and ResourceManager services are deployed on hadoop001.

Cluster version of hadoop installation, written for busy people~

2. Precondition

The operation of Hadoop depends on JDK, which needs to be installed in advance. The installation steps are separately organized to:

2.1 Download and unzip

Download the required version of JDK 1.8 on the official website, and unzip it after downloading:

[root@ java]# tar -zxvf jdk-8u201-linux-x64.tar.gz
2.2 Set environment variables
[root@ java]# vi /etc/profile

Add the following configuration:

export JAVA_HOME=/usr/java/jdk1.8.0_201  
export JRE_HOME=${JAVA_HOME}/jre  
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
export PATH=${JAVA_HOME}/bin:$PATH

Execute the source command to make the configuration take effect immediately:

[root@ java]# source /etc/profile
2.3 Check whether the installation is successful
[root@ java]# java -version

If the corresponding version information is displayed, the installation is successful.

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

3. Configure password-free login

3.1 Generate Key

Use the ssh-keygen command on each host to generate a public key private key pair:

3.2 Password-free login

Write the public key of hadoop001 to the ~/.ssh/authorized_key file of the local machine and the remote machine:

ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop001
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop002
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop003
3.3 Verify password-free login
ssh hadoop002
ssh hadoop003

4. Cluster construction

4.1 Download and unzip

Download Hadoop. Here I downloaded the CDH version of Hadoop,

下载地址为:
http://archive.cloudera.com/cdh5/cdh/5/

# tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz 
4.2 Configure environment variables

Edit the profile file:

Add the following configuration:

export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
export  PATH=${HADOOP_HOME}/bin:$PATH

Execute the source command to make the configuration take effect immediately:

4.3 Modify configuration

Enter ${HADOOP_HOME}/etc/hadoop directory and modify the configuration file. The contents of each configuration file are as follows:

  1. hadoop-env.sh
    # 指定JDK的安装位置
    export JAVA_HOME=/usr/java/jdk1.8.0_201/
  2. core-site.xml
    <configuration>
    <property>
        <!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:8020</value>
    </property>
    <property>
        <!--指定 hadoop 集群存储临时文件的目录-->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    </configuration>
  3. hdfs-site.xml
    <property>
      <!--namenode 节点数据(即元数据)的存放位置,可以指定多个目录实现容错,多个目录用逗号分隔-->
    <name>dfs.namenode.name.dir</name>
    <value>/home/hadoop/namenode/data</value>
    </property>
    <property>
      <!--datanode 节点数据(即数据块)的存放位置-->
    <name>dfs.datanode.data.dir</name>
    <value>/home/hadoop/datanode/data</value>
    </property>
  4. yarn-site.xml
    <configuration>
    <property>
        <!--配置 NodeManager 上运行的附属服务。需要配置成 mapreduce_shuffle 后才可以在 Yarn 上运行 MapReduce 程序。-->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <!--resourcemanager 的主机名-->
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop001</value>
    </property>
    </configuration>
  5. mapred-site.xml
    <configuration>
    <property>
        <!--指定 mapreduce 作业运行在 yarn 上-->
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    </configuration>
  6. slaves
    configures the host names or IP addresses of all slave nodes, one per line. The DataNode service and NodeManager service on all slave nodes will be started.
    hadoop001
    hadoop002
    hadoop003
4.4 Distribution program

Distribute the Hadoop installation package to the other two servers. After distribution, it is recommended to configure Hadoop environment variables on the two servers.

# 将安装包分发到hadoop002
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/  hadoop002:/usr/app/
# 将安装包分发到hadoop003
scp -r /usr/app/hadoop-2.6.0-cdh5.15.2/  hadoop003:/usr/app/
4.5 Initialization

Execute the namenode initialization command on Hadoop001:

hdfs namenode -format
4.6 Start the cluster

Go to the ${HADOOP_HOME}/sbin directory of Hadoop001 and start Hadoop. At this time, related services on hadoop002 and hadoop003 will also be started:

# 启动dfs服务
start-dfs.sh
# 启动yarn服务
start-yarn.sh
4.7 View cluster

Use the jps command on each server to view the service process, or directly enter the Web-UI interface to view, the port is 50070. You can see that there are three available Datanodes at this time:

Cluster version of hadoop installation, written for busy people~

Click Live Nodes to enter, you can see the details of each DataNode:

Cluster version of hadoop installation, written for busy people~

Then you can check the Yarn status, the port number is 8088:

Cluster version of hadoop installation, written for busy people~

5. Submit service to the cluster

The way of submitting jobs to the cluster is exactly the same as the stand-alone environment. Here is an example of submitting the built-in Hadoop Pi calculation program. It can be executed on any node. The command is as follows:

hadoop jar /usr/app/hadoop-2.6.0-cdh5.15.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.15.2.jar  pi  3  3

More dry goods attention: data is great

Cluster version of hadoop installation, written for busy people~

Guess you like

Origin blog.51cto.com/14974545/2549460