hadoop3.2.4 pseudo-distributed environment construction

Big Data


foreword

Big data hadoop learning, see the official website, there are three ways to build hadoop,

  • stand-alone
  • Pseudo-distributed
  • Cluster deployment
    This is a pseudo-distributed deployment, that is, all necessary programs are deployed on one machine. It is the easiest way to learn to use. The main programs of hadoop are 1. hdfs, distributed file system, after startup, there will be two java programs, one is datanode and the other is namenode.
    2. Yarn Yarn is the distribution control of cluster tasks. The main programs include nodemanage and resourcemanage. Mapreduce is just a calculation program, not a service started by the framework itself.

1. Download the installation package

The download and installation package address is:
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
preconditions, installed jdk8, here I won't expand, there are many tutorials.

2. Installation steps

2.1. Unzip the hadoop installation package

Copy the installation package to the directory you need to install. Here I created a new installation directory
/root/tools
to execute the command

tar –zxvf  hadoop-3.2.4.tar.gz

After decompression, you get the following directory
/root/tools/hadoop-3.2.4
insert image description here

2.2 Modify environment variables

vi /etc/profile

Add at the end of the file:

export JAVA_HOME=/root/tools/jdk/jdk1.8.0_144
export HADOOP_HOME=/root/tools/hadoop-3.2.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/root/tools/hadoop-3.2.4/etc/Hadoop

Here, the path where I installed hadoop is /root/tools/hadoop-3.2.4
The path where jdk is installed is /root/tools/jdk/jdk1.8.0_144
After editing, save and execute the command as follows to make the changes just take effect.

source /etc/profile

Excuting an order

hadoop version

insert image description here

2.3 Local password-free login

2.3.1 Execute the command

  ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
  cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  chmod 0600 ~/.ssh/authorized_keys

2.3.2 Whether the test is effective

After execution, enter ssh localhost and the normal output is as follows

ssh localhost

insert image description here

2.4 Modify the configuration file

2.4.1 Modify core-site.xml in the /root/tools/hadoop-3.2.4/etc/hadoop directory

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

2.4.2 Modify the file content of hdfs-site.xml in the /root/tools/hadoop-3.2.4/etc/hadoop directory

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

2.5 start hdfs

2.5.1 Format local files and enter the /root/tools/hadoop-3.2.4 directory

The first startup needs to execute the command

bin/hdfs namenode –format

The error is as follows:
[root@localhost hadoop-3.2.4]# bin/hdfs namenode –format
ERROR: JAVA_HOME is not set and could not be found.
Solution: Modify
etc/Hadoop/hadoop-env.sh of the decompressed directory
java_home path,
export JAVA_HOME=/root/tools/jdk/jdk1.8.0_144
If you do not know the path, you can use vi /etc/profile to view the java_home path. ,
the execution result is as follows
After the error is resolved, the result of executing the format command is shown in the figure below
insert image description here

2.5.2 start hdfs

Excuting an order

./sbin/start-dfs.sh

The error is as follows
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation
Solution:
Add in etc/Hadoop/hadoop-env.sh file

export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

2.5.3 Enter in the browser after execution

http://localhost:9870/ namenode web address
The result is as follows, you can see that the following interface started successfully
insert image description here

Three test demo examples

3.1 Create an execution directory

bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/root

3.2 Copy input file

bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input

3.3 Execute the MapReduce jar

Execute the command

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'

The execution result is shown in the figure below
insert image description here

3.4 Copy the execution result and view the execution result

bin/hdfs dfs -get output output
cat output/*

The result is as shown below
insert image description here

4. Execute calculations in yarn mode

Yarn calculation is also based on the premise of starting hdfs, and the above steps still need to be executed.

4.1 Modify the configuration file

4.1.1 Configuration file etc/hadoop/mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

4.1.2 etc/hadoop/yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

4.2 start yarn

sbin/start-yarn.sh

Enter the following address in the browser, the ResourceManager page:
http://localhost:8088/
insert image description here

4.3 Executing calculations

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'

The error is as follows

org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/root/grep-temp-1038537

Solution:
Add in the hdfs-site.xml file

<property>

                <name>dfs.safemode.threshold.pct</name>

                <value>0f</value>

                <description>

                        Specifies the percentage of blocks that should satisfy

                        the minimal replication requirement defined by dfs.replication.min.

                        Values less than or equal to 0 mean not to wait for any particular

                        percentage of blocks before exiting safemode.

                        Values greater than 1 will make safe mode permanent.

                </description>

        </property>  

4.4 Restart hdfs and re-execute the calculation jar command

The error is as follows
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/roo
Solution: Delete the output directory./bin/hdfs
dfs -rm -r output

4.5 Re-execute the jar command

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'

To view the output results, execute the command bin/hdfs dfs -cat output/*
as shown in the figure below
insert image description here
Hdfs starts with SecondaryNameNode, namenode, datanode, yarn starts with NodeManager ResourceManager

Summarize

This article is practiced based on the cases on the official website, and the problems encountered in the process are also recorded. If this article is helpful to you, please give it a thumbs up. If it is wrong, you are welcome to point it out.

Guess you like

Origin blog.csdn.net/qq_34526237/article/details/129931596