A complete hadoop installation, it is recommended to collect

Introduction

Previously, my back-end partner sent a private message. I want to know about the stand-alone environment installation of hadoop. Today, I arranged it for everyone. Not much nonsense, I will directly send dry goods.

table of Contents

  1. Precondition
  2. Configure SSH password-free login
  3. Hadoop (HDFS) environment construction
  4. Hadoop (YARN) environment construction

1, precondition

The operation of Hadoop depends on JDK, which needs to be installed in advance. See the installation steps:

1.1 Download and unzip

Download the required version of JDK on the official website. The version I downloaded here is JDK 1.8. After downloading, unzip it:

[root@ java]# tar -zxvf jdk-8u201-linux-x64.tar.gz
1.2 Set environment variables
[root@ java]# vi /etc/profile
添加如下配置:

export JAVA_HOME=/usr/java/jdk1.8.0_201  
export JRE_HOME=${JAVA_HOME}/jre  
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib  
export PATH=${JAVA_HOME}/bin:$PATH
执行 source 命令,使得配置立即生效:

[root@ java]# source /etc/profile
1.3 Check whether the installation is successful
[root@ java]# java -version
显示出对应的版本信息则代表安装成功。

java version "1.8.0_201"
Java(TM) SE Runtime Environment (build 1.8.0_201-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

2. Configure password-free login

Communication between Hadoop components needs to be based on SSH.

2.1 Configuration mapping

Configure ip address and host name mapping:

vim /etc/hosts
# 文件末尾增加
192.168.43.202  hadoop001
2.2 Generate public and private keys

Execute the following command line to generate public and private keys:

ssh-keygen -t rsa
2.3 Authorization

Go to the ~/.ssh directory, view the generated public and private keys, and write the public key to the authorization file:

[root@@hadoop001 sbin]#  cd ~/.ssh
[root@@hadoop001 .ssh]# ll
-rw-------. 1 root root 1675 3 月  15 09:48 id_rsa
-rw-r--r--. 1 root root  388 3 月  15 09:48 id_rsa.pub

# 写入公匙到授权文件
[root@hadoop001 .ssh]# cat id_rsa.pub >> authorized_keys
[root@hadoop001 .ssh]# chmod 600 authorized_keys

3. Hadoop (HDFS) environment construction

3.1 Download and unzip

Download the Hadoop installation package, here I downloaded the CDH version, the download address is: http://archive.cloudera.com/cdh5/cdh/5/

# 解压
tar -zvxf hadoop-2.6.0-cdh5.15.2.tar.gz 
3.2 Configure environment variables
# vi /etc/profile
配置环境变量:

export HADOOP_HOME=/usr/app/hadoop-2.6.0-cdh5.15.2
export  PATH=${HADOOP_HOME}/bin:$PATH
执行 source 命令,使得配置的环境变量立即生效:

# source /etc/profile
3.3 Modify Hadoop configuration

Enter the ${HADOOP_HOME}/etc/hadoop/ directory and modify the following configuration:

  1. hadoop-env.sh
# JDK安装路径
export  JAVA_HOME=/usr/java/jdk1.8.0_201/
  1. core-site.xml
    <configuration>
    <property>
        <!--指定 namenode 的 hdfs 协议文件系统的通信地址-->
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:8020</value>
    </property>
    <property>
        <!--指定 hadoop 存储临时文件的目录-->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
    </configuration>
  2. hdfs-site.xml

    指定副本系数和临时文件存储位置:
    <configuration>
    <property>
        <!--由于我们这里搭建是单机版本,所以指定 dfs 的副本系数为 1-->
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    </configuration>
  3. slaves
    配置所有从属节点的主机名或 IP 地址,由于是单机版本,所以指定本机即可:
    hadoop001
3.4 Turn off the firewall
不关闭防火墙可能导致无法访问 Hadoop 的 Web UI 界面:

# 查看防火墙状态
sudo firewall-cmd --state
# 关闭防火墙:
sudo systemctl stop firewalld.service
3.5 Initialization
第一次启动 Hadoop 时需要进行初始化,进入 ${HADOOP_HOME}/bin/ 目录下,执行以下命令:

[root@hadoop001 bin]# ./hdfs namenode -format
3.6 Start HDFS
进入 ${HADOOP_HOME}/sbin/ 目录下,启动 HDFS:

[root@hadoop001 sbin]# ./start-dfs.sh

3.7 Verify whether the startup is successful

方式一:执行 jps 查看 NameNode 和 DataNode 服务是否已经启动:

[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
9390 SecondaryNameNode
方式二:查看 Web UI 界面,端口为 50070:

A complete hadoop installation, it is recommended to collect

4. Hadoop (YARN) environment construction

4.1 Modify configuration

Enter the ${HADOOP_HOME}/etc/hadoop/ directory and modify the following configuration:

  1. mapred-site.xml
    # 如果没有mapred-site.xml,则拷贝一份样例文件后再修改
    cp mapred-site.xml.template mapred-site.xml
    <configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    </configuration>
  2. yarn-site.xml
    <configuration>
    <property>
        <!--配置 NodeManager 上运行的附属服务。需要配置成 mapreduce_shuffle 后才可以在 Yarn 上运行 MapReduce 程序。-->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    </configuration>

    4.2 Start service

    
    进入 ${HADOOP_HOME}/sbin/ 目录下,启动 YARN:

./start-yarn.sh

4.3 验证是否启动成功
```bash
方式一:执行 jps 命令查看 NodeManager 和 ResourceManager 服务是否已经启动:

[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
12294 NodeManager
12185 ResourceManager
9390 SecondaryNameNode
方式二:查看 Web UI 界面,端口号为 8088:

A complete hadoop installation, it is recommended to collect

For more dry goods, pay attention to: [Data is great]

A complete hadoop installation, it is recommended to collect

Guess you like

Origin blog.51cto.com/14974545/2544920