Detailed installation Hadoop 3.1.1 cluster on Ubuntu 18.04.1

First of all, we need to create three virtual machines to the cluster, creates a 4 vCPU, Hadoop Master server 4 GB of memory and 40 GB hard disk space; create two with 4 vCPU, 8 GB of memory and 40 for each node Hadoop node GB of hard disk space.

Three servers in this article is installed Ubuntu Server 18.04.1, all updates are installed and restarted, be sure to use a static IP address and DNS resolution internal configuration of each server, or add each server to the / etc / hosts file.

Ready to run Hadoop server

# add-apt-repository ppa:webupd8team/java
# apt update
# apt install -y oracle-java8-set-default

Accept the license terms, and download the binary file Hadoop

# wget http://apache.claz.org/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

Unzip the archive and move it to / usr / local /

# tar -xzvf hadoop-3.1.1.tar.gz
# mv hadoop-3.1.1 /usr/local/hadoop

Update the default environment variables JAVA_HOME and Hadoop to include binary directory.

First of all, we need to know Java installation location, run the following command to find.

# update-alternatives --display java
java - manual mode
  link best version is /usr/lib/jvm/java-8-oracle/jre/bin/java
  link currently points to /usr/lib/jvm/java-8-oracle/jre/bin/java
  link java is /usr/bin/java
  slave java.1.gz is /usr/share/man/man1/java.1.gz
/usr/lib/jvm/java-8-oracle/jre/bin/java - priority 1081
  slave java.1.gz: /usr/lib/jvm/java-8-oracle/man/man1/java.1.gz

As indicated above, JAVA_HOME should be set to / usr / lib / jvm / java-8-oracle / jre.

Open the / etc / environment and update the PATH line to include Hadoop binary directory.

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/local/hadoop/bin:/usr/local/hadoop/sbin"

JAVA_HOME variables and to add a line YARN_RESOURCEMANAGER_OPTS variables.

vim
YARN_RESOURCEMANAGER_OPTS="--add-modules=ALL-SYSTEM"

Ensure that the catalog update-alternatives options above subtracts the output portion of the bin / java match.

Next, we will add a hadoop users and provide them with the correct permissions.

# adduser hadoop
# usermod -aG hadoop hadoop
# chown hadoop:root -R /usr/local/hadoop
# chmod g+rwx -R /usr/local/hadoop

Log in and generate SSH keys to hadoop user, only you need to complete this step on Hadoop Master.

# su - hadoop
# ssh-keygen -t rsa

Accept all the defaults of ssh-keygen.

Log in now hadoop SSH keys and copied to all Hadoop nodes. Similarly, only we need to complete this step on Hadoop Master.

# su - hadoop
$ ssh-copy-id [email protected]
$ ssh-copy-id [email protected]
$ ssh-copy-id [email protected]

Configuring Hadoop master server

Open /usr/local/hadoop/etc/hadoop/core-site.xml file and enter the following:

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://hadoop1.admintome.lab:9000</value>
  </property>
</configuration>

Save and exit.

Next, open /usr/local/hadoop/etc/hadoop/hdfs-site.xml file and add the following:

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>/usr/local/hadoop/data/nameNode</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>/usr/local/hadoop/data/dataNode</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
</configuration>

Save and exit.

Open the / usr / local / hadoop / etc / hadoop / workers file and add the following two lines (one line each Hadoop node)

hadoop2.admintome.lab
hadoop3.admintome.lab

Save and exit.

Copy the configuration file from the Hadoop Master to each Hadoop node.

# scp /usr/local/hadoop/etc/hadoop/* hadoop2.admintome.lab:/usr/local/hadoop/etc/hadoop/
# scp /usr/local/hadoop/etc/hadoop/* hadoop3.admintome.lab:/usr/local/hadoop/etc/hadoop/

HDFS file system format

$ source /etc/environmnet
$ hdfs namenode -format

You can now start HDFS:

hadoop@hadoop1:~$ start-dfs.sh
Starting namenodes on [hadoop1.admintome.lab]
Starting datanodes
Starting secondary namenodes [hadoop1]
hadoop@hadoop1:~$

All content is correctly started by Hadoop user to verify the identity of run jps command on all Hadoop servers.

On the Hadoop Master you should see the following results:

hadoop@hadoop1:~$ jps
13634 Jps
13478 SecondaryNameNode
13174 NameNode

On each Hadoop node, you should see:

hadoop@hadoop2:~$ jps
8672 Jps
8579 DataNode
HDFS Web UI

HDFS Web UI

Now, we can browse to the Hadoop master port 9870 to access HDFS Web UI.

http://hadoop1.admintome.lab:9870

We can see the following UI:

As shown above, there are nearly 60 GB of free space on our HDFS file system.

Start running Yarn

HDFS is running now, we are ready to start Yarn scheduler.

Hadoop itself needs to run the task, so we need to rationalize Yarn tasks on Hadoop clusters.

export HADOOP_HOME="/usr/local/hadoop"
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME

Run the following command to start Yarn:

$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers

We can verify this by the following command if you can start correctly:

$ yarn node -list
2018-08-15 04:40:26,688 INFO client.RMProxy: Connecting to ResourceManager at hadoop1.admintome.lab/192.168.1.35:8032
Total Nodes:2
        Node-Id      Node-State  Node-Http-Address  Number-of-Running-Containers
hadoop3.admintome.lab:35337          RUNNING  hadoop3.admintome.lab:8042                            0
hadoop2.admintome.lab:38135          RUNNING  hadoop2.admintome.lab:8042                            0

Without any container you are running, because we have not started any work.

Hadoop Web UI

We can view the Hadoop Web UI via the following URL:

http://hadoop1.admintome.lab:8088/cluster

Replace Hadoop Master Host Name:

Example run Hadoop

We can now run Hadoop tasks and schedule it for example on the cluster, we'll run an example is the use of MapReduce to compute PI.

Run the following command to run the job:

yarn jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar pi 16 1000

To complete the whole process will take a few minutes. When finished, you should see that it has begun to calculate PI:

Job Finished in 72.973 seconds
Estimated value of Pi is 3.1425000000000000000

Guess you like

Origin www.linuxidc.com/Linux/2019-07/159623.htm