Level 2: Configure the development environment-Hadoop installation and pseudo-distributed cluster construction

1 download hadoop

Official website: http://hadoop.apache.org/

2 Install hadoop

The official document: http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-common/SingleCluster.html
the hadoop archive into /appa folder

cd /opt
tar -zxvf hadoop-3.1.0.tar.gz -C /app

Let's switch to the app directory and modify the name of the hadoop folder. (This step can be done or not)

mv hadoop-3.1.0 hadoop3.1

2.1 Configure Hadoop environment

Next we begin to configure the Hadoop development environment.
Let's build a single-node cluster and configure a pseudo-distribution, why not do it distributed?
In fact, the distributed configuration is similar to the pseudo-distributed configuration, except that the number of distributed machines is increased, and the others are no different. Therefore, it is better to build pseudo-distributed as Hadoop learning, but we will build a real distributed environment later.

2.1.1 Set up SSH password-free login

We need to log in to the host and slave frequently when operating the cluster later, so it is necessary to set up SSH password-free login.
Enter the following code:

 ssh-keygen -t rsa -P ''

Generate a passwordless key pair, ask for the save path and directly enter and press Enter to generate a key pair: id_rsaand id_rsa.pub, which are stored in the ~/.sshdirectory by default .
Next: id_rsa.pubadd it to the authorized key.

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Then modify the permissions:

chmod 600 ~/.ssh/authorized_keys

Then you need to enable RSA authentication, start the public key private key pairing authentication method:
vim /etc/ssh/sshd_configif it prompts that the authority is insufficient, add it before the command sudo;
modify the sshconfiguration:

RSAAuthentication yes # 启用 RSA 认证
PubkeyAuthentication yes # 启用公钥私钥配对认证方式
AuthorizedKeysFile %h/.ssh/authorized_keys # 公钥文件路径

Restart SSH (you can restart in your own virtual machine locally, but you can’t restart on the platform, and you don’t need to. After restarting, you won’t be able to connect to the command line!)
service ssh restart

2.1.2 Configure Hadoop files

6 files in total

hadoop-env.sh;
yarn-env.sh ;
core-site.xml;
hdfs-site.xml;
mapred-site.xml;
yarn-site.xml。
2.1.3 hadoop-env.sh configuration

The two env.shfiles are mainly JDKthe location of the configuration

Tip: If you forget the location of the JDK, enter echo $JAVA_HOMEyou can see, oh.

First we switch to the hadoopdirectory

cd /app/hadoop3.1/etc/hadoop/

Edit hadoop-env.shInsert the following code in the file:

# The java implementation to use.  
#export JAVA_HOME=${JAVA_HOME}  
export JAVA_HOME=/app/jdk1.8.0_171
2.1.4 yarn-env.sh configuration

Edit and yarn-env.shinsert the following code:

export JAVA_HOME=/app/jdk1.8.0_171
2.1.5 core-site.xml placement

This is the core configuration file we need to add in the file HDFSare URIand NameNodetemporary files folder location, this temporary folder is created below. Add the following code to
the configurationtag at the end of the file :

<configuration>  
 <property>  
    <name>fs.default.name</name>  
    <value>hdfs://localhost:9000</value>  
    <description>HDFS的URI,文件系统://namenode标识:端口号</description>  
</property>  
<property>  
    <name>hadoop.tmp.dir</name>  
    <value>/usr/hadoop/tmp</value>  
    <description>namenode上本地的hadoop临时文件夹</description>  
</property>  
</configuration>  
2.1.6 hdfs-site.xml file configuration

replicationRefers to the number of replicas, we are now a single node, so yes 1.

<configuration>  
<property>  
    <name>dfs.name.dir</name>  
    <value>/usr/hadoop/hdfs/name</value>  
    <description>namenode上存储hdfs名字空间元数据 </description>   
</property>  
<property>  
    <name>dfs.data.dir</name>  
    <value>/usr/hadoop/hdfs/data</value>  
    <description>datanode上数据块的物理存储位置</description>  
</property>  
<property>  
    <name>dfs.replication</name>  
    <value>1</value>  
</property>  
</configuration>  
2.1.7 mapred-site.xml file configuration
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
2.1.8 yarn-site.xml placement
<configuration>  
<property>  
        <name>yarn.nodemanager.aux-services</name>  
        <value>mapreduce_shuffle</value>  
</property>  
<property>  
        <name>yarn.resourcemanager.webapp.address</name>  
        <value>192.168.2.10:8099</value>  
        <description>这个地址是mr管理界面的</description>  
</property>  
</configuration>  
2.1.9 Create folder

We configured in the configuration file in some folder path, and now we have to create them, /usr/hadoop/use the directory hadoopuser operation, build tmp, hdfs/name, hdfs/datadirectory, execute the following command:

mkdir -p /usr/hadoop/tmp 
mkdir /usr/hadoop/hdfs 
mkdir /usr/hadoop/hdfs/data 
mkdir /usr/hadoop/hdfs/name

2.2 Add Hadoop to environment variables

vim /etc/profile

Insert the following code at the end of the file:

export HADOOP_HOME=/app/hadoop3.1
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Finally make the changes take effect:source /etc/profile

2.3 Verification

Now that the configuration work is basically done, the next step is to complete:

  1. Format HDFSfiles,
  2. Start hadoop,
  3. Just verify Hadoop.

2.4 Format

Before using Hadoop, we need to format some hadoopbasic information.
Use the following command:

hadoop namenode -format

start up

Next we start Hadoop:

start-dfs.sh

Enter the command and the interface as shown in the figure below should appear:
Insert picture description here
This means that the startup was unsuccessful, because the rootuser can't start it yet hadoop. Let's set it up.

In the /hadoop3.1/sbinnext cd /app/hadoop3.1/sbinpath: .
Will start-dfs.sh, stop-dfs.shtwo top of the file, add the following parameters

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

Also,start-yarn.sh , stop-yarn.shthe top also need to add the following:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

Start again start-dfs.sh, and finally enter the command jpsto verify, the following interface represents the successful start:
If after you is a graphical interface, you can open the Firefox browser input in the graphical interface of your virtual machine: http: // localhost: 9870 / or windows Enter http://virtual machine ip address:9870/ on the machine to access the management page of hadoop.

Guess you like

Origin blog.csdn.net/zx77588023/article/details/112426897