Hadoop installation and configuration steps

Hadoop installation

This installation is based on the previous installation of jdk and mysql. If other friends want to know, you can check the previous installation steps. Here is the portal.
This installation uses the MobaXterm auxiliary tool. First, the corresponding hadoop compressed package Upload to the root/software folder for backup

1. Decompress hadoop free installation compressed package

After entering the root/software folder, enter the decompression command

tar -zxvf hadoop-2.6.0-cdh5.14.2.tar.gz

The execution is as follows:
Insert picture description here

2. Modify the folder name

For the sake of convenience later, first modify the file name of hadoop to be shorter. This step is not important. It is a matter of personal habit and can be ignored.

mv hadoop-2.6.0-cdh5.14.2.tar.gz hadoop

Insert picture description here

3. Delete the compressed file

After decompression, the original file is useless, it is recommended to delete it directly

rm -f hadoop-2.6.0-cdh5.14.2.tar.gz

4. Prepare the configuration environment

Enter the hadoop folder whose name has just been modified, and then there is a hadoop in the etc directory, and prepare to configure the environment after entering

cd hadoop/etc/hadoop

Configuration Environment

1. Configure hadoop-env.sh file to associate java

Enter the hadoop-env.sh file and enter the edit mode. After annotating the original path, add the actual path of JAVA_HOME,

vi hadoop-env.sh

For some reasons, the original JAVA_HOME cannot be used. After commenting it out, manually add a new path:
export JAVA_HOME=/root/software/jdk1.8.0_221

Insert picture description here

If you don’t remember the path of JAVA_HOME, you can exit and enter:

echo $JAVA_HOME

View the actual path of JAVA_HOME
Insert picture description here

2. Configure the core-site.xml file and configure the core

Enter the core-site.xml file

vi core-site.xml

Save and exit after typing in the last page:

<property>
   <name>fs.defaultFS</name>
   <value>hdfs://hadoop102:9000</value>
</property>
<property>
   <name>hadoop.tmp.dir</name>
   <value>/root/software/hadoop/tmp</value>
</property>
<property>
   <name>hadoop.proxyuser.root.hosts</name>
   <value>*</value>
 </property>
<property>
   <name>hadoop.proxyuser.root.groups</name>
   <value>*</value>
 </property>

In the place marked in the red box, hadoop is the hostname of the original virtual machine, and the second value is a new tmp folder in the actual installation address. Just copy and paste the last two directly.
Insert picture description here

3. Configure hdfs-site.xml configuration type

Enter hdfs-site.xml file

vi hdfs-site.xml

Because multiple virtual machines are used here to simulate the distributed structure, which is pseudo-distributed, so this set node is 1
later when needed, then add full distributed and others
. Save and exit after entering the same at the end:

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

Insert picture description here

4. Configure the mapred-site.xml.template file

Enter the mapred-site.xml.template file

vi mapred-site.xml.template

Save and exit after typing in the same at the end:

<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

Insert picture description here

5. Configure the vi yarn-site.xml file

Enter the yarn-site.xml file

vi yarn-site.xml

Save and exit after typing in the same at the end:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop102</value>
</property>

Fill in the hostname of the machine in the position marked in red
Insert picture description here

6. Configure hadoop environment variables

Enter the file in the etc/profile directory

vi /etc/profile

Add in the last position

export HADOOP_HOME=/root/software/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Among them, the first line is the installation directory of hadoop, fill in according to your actual situation, and the last line is to add HADOOPHOME / bin: HADOOP_HOME/bin:HADOOPHOME/bin: The path of HADOOP_HOME/sbin, then save and exit:

Insert picture description here

7. Let environment variables take effect

Enter the command to make the newly configured environment variable take effect

source /etc/profile

8. Initialize

input the command:

hadoop namenode -format

If you see successful and status 0, it means the initialization is successful, and there is no problem with the previous configuration
Insert picture description here

9, start the service

Enter first

cd ../..

Return to the hadoop main path
and enter

start-all.sh

To start the service, start-all.sh is equivalent to start-dfs.sh plus start-yarn.sh

After entering yes twice, the startup is complete, you can enter jps to check whether the service has started successfully

Insert picture description here
As you can see here, there are mainly these 6 service items, which means that the startup is successful

10. Test

At the same time, you can enter the address bar through the browser of the machine:
IP address: 50070
Insert picture description here
Here is a more intuitive view of the files in the Hadoop distributed file system
Insert picture description here

11. Upload test:

input the command:

hdfs dfs -put READ.txt /test

Upload the file READ.txt to the test folder
and enter the command for calculation (wordcount for the READ.txt file):

hadoop jar share/hadoop/mapreduce2/hadoop-mapreduce-examples-2.6.0-cdh5.14.2.jar wordcount /test/READ.txt /output

`![在这里插入图片描述](https://img-blog.csdnimg.cn/20210313215102699.png)
``

运行成功后输入命令

```sql
hdfs dfs -cat /output/part-r-00000

You can view the number of each word counted
Insert picture description here

You can also log in to the browser to view the results:
IP address: 8088
Insert picture description here
Today’s sharing is here, thank you

Guess you like

Origin blog.csdn.net/giantleech/article/details/114761077