ubuntu server18.04 install hadoop3.2.1 cluster

This document describes how to build a fully distributed hadoop cluster with one master node and two data nodes.

Configuration Environment

1. System environment

This time the node server is set up as three local ubuntu server 18.04, the virtual machine configuration screenshot is as follows:
Insert picture description here
the ip of the three machines are as follows:

  • 192.168.1.113
  • 192.168.1.114
  • 192.168.1.115

2. Install the java environment

apt install openjdk-8-jdk-headless

Configure JAVA environment variables, add the following content at the bottom of the .profile file in the current user's root directory:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Note: The root directory of the current user is actually under the /home/your username directory. If you don’t see the .profile file, just go directly to vim. In fact, the real file address of this file is under the etc directory. We won’t go into it here.

Use the source command to make it effective immediately

source .profile

You can use the following command to test whether the configuration is successful

echo $JAVA_HOME

3. Configure host

vim /etc/hosts

Add the following content, according to the personal server IP configuration

# 注意每个机器都要配置三个

192.168.1.113 master
192.168.1.114 slave1
192.168.1.115 slave2

4. Configure password-free login

Production key

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

The master logs in to the slave without password

# 分别对应添加另外两个节点的就行

ssh-copy-id -i ~/.ssh/id_rsa.pub master
ssh-copy-id -i ~/.ssh/id_rsa.pub slave1
ssh-copy-id -i ~/.ssh/id_rsa.pub slave2

Test password-free login

ssh master 
ssh slave1
ssh slave2

Hadoop node construction

1. Download the installation package and create a Hadoop directory

Configure the master node first

#下载  
wget http://apache.claz.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
#解压到 /usr/local 目录
sudo tar -xzvf  hadoop-3.2.1.tar.gz    -C /usr/local 
#重命名文件夹  
cd /usr/local/ 
sudo mv  hadoop-3.2.1  hadoop

The best way to download is to download it directly with Thunder, and then use winscp to upload it. This is faster, but wget is really slow.

2. Configure the Hadoop environment variables of the Master node

Just like configuring JDK environment variables, edit .profilefiles in the user directory and add Hadoop environment variables:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin 
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Copy the code and execute source.profile for immediate effect

3. Configure the Master node Hadoop configuration file

Various components of Hadoop are configured with the XML file, configuration files are placed in the /usr/local/hadoop/etc/hadoopdirectory:

  • core-site.xml: Configure common properties, such as I/O settings commonly used by HDFS and MapReduce, etc.
  • hdfs-site.xml: Hadoop daemon configuration, including namenode, auxiliary namenode and datanode, etc.
  • mapred-site.xml: MapReduce daemon configuration
  • yarn-site.xml: resource scheduling related configuration

Note: Indentation is troublesome when copying vim, it is recommended to use the winscp editor to copy and paste

a. Edit the core-site.xml file and modify the content as follows:

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
        <description>Abase for other temporary directories.</description>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
</configuration>

Copy code parameter description:

  • fs.defaultFS: The default file system. HDFS clients need this parameter to access HDFS
  • hadoop.tmp.dir: Specify the temporary directory for Hadoop data storage. Other directories will be based on this path. It is recommended to set it to a place with enough space instead of the default /tmp

If the hadoop.tmp.dir parameter is not configured, the system uses the default temporary directory: /tmp/hadoo-hadoop. This directory will be deleted after each restart, and format must be executed again, otherwise an error will occur.

b. Edit hdfs-site.xml and modify the content as follows:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/hadoop/hdfs/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/usr/local/hadoop/hdfs/data</value>
    </property>
</configuration>

Copy code parameter description:

  • dfs.replication: the number of data block copies
  • dfs.name.dir: Specify the file storage directory of the namenode node
  • dfs.data.dir: Specify the file storage directory of the datanode node

c. Edit mapred-site.xml and modify the content as follows:

<configuration>
  <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
  <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_HOME/share/hadoop/mapreduce/*:$HADOOP_HOME/share/hadoop/mapreduce/lib/*</value>
  </property>
</configuration>

Copy the code d. Edit yarn-site.xml and modify the content as follows:

<configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME</value>
    </property>
</configuration>

Copy the code e. Edit workers, modify the content as follows:

slave1
slave2

5. Configuration of the other two Slave nodes

Package the configured Hadoop of the Master node and send it to the other two nodes:

#打包hadoop包
tar -cxf hadoop.tar.gz /usr/local/hadoop

It should be noted here that spark replaces itself with your username

scp ./hadoop.tar.gz  spark@slave1:~
scp ./hadoop.tar.gz  spark@slave2:~

Copy the code to pressurize the Hadoop package on other nodes to the /usr/local directory

sudo tar -xzvf hadoop.tar.gz -C /usr/local/

Copy the code to configure the Hadoop environment variables of the Slave1 and Slaver2 nodes:

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin 
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Make environment variables take effect

source .profile

6. Supplementary configuration

Modify the three nodes /usr/local/hadoop/etc/hadoop/hadoop-env.sh, add the following JAVA environment variables

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

Assign 777 to all files

# /usr/local
chmod -R 777 ./hadoop/

In the /hadoop/sbin path:
add the following parameters to the top of the start-dfs.sh and stop-dfs.sh files

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Also, start-yarn.sh and stop-yarn.sh also need to add the following at the top:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Start the cluster

1. Format the HDFS file system

Enter the Hadoop directory of the Master node and perform the following operations:

bin/hadoop namenode -format

2. Start the cluster

# /usr/local/hadoop/sbin
start-all.sh

Visualization, just replace the master node ip

http://192.168.1.113:9870/
http://192.168.1.113:8088/

Guess you like

Origin blog.csdn.net/wy_97/article/details/104792965