Hadoop fully distributed environment configuration

1 Clone a virtual machine

The cloned virtual machine is shut down
Insert picture description here

Insert picture description here

Insert picture description here
Insert picture description here
Insert picture description here

Insert picture description here

Insert picture description here
Insert picture description here
Insert picture description here
modifyhostname

vi /etc/sysconfig/network

Insert picture description here
View ip configuration file

ip addr

Insert picture description here
To modify the ip address in the ip file, you only need to change the last digit of the ip to any number

vi /etc/sysconfig/network-scripts/ifcfg-ens33

Insert picture description here

vi /etc/hosts

Insert picture description here
Note : etc/hostsBoth the cloned virtual machine and the cloned virtual machine need to be added

Restart the virtual machine.
Insert picture description here
At this point, the host2 (ip: 192.168.159.158) virtual machine is cloned.Similarly, clone a host3 (ip: 192.168.159.157) virtual machine

2 Server function planning

Determine the function of each server

host1 host2 host3
NameNode ResourceManage
DataNode DataNode DataNode
NodeManager NodeManager NodeManager
HistoryServer SecondaryNameNode

3 Install new Hadoop on the first machine

3.1 Preparation

In order to distinguish it from the previous installation of pseudo-distributed Hadoop on the host1 machine , weStop all Hadoop services of host1, And then /opt/modules/appinstall another Hadoop in a new directory .
We install the cluster by first decompressing and configuring Hadoop on the first machine, and then distributing it to the other two machines.
Insert picture description here

3.2 Unzip the Hadoop directory:

tar -zxf /opt/hadoop/hadoop-2.10.1.tar.gz -C /opt/modules/app/

3.3 Configure jdk, modify hadoop-env.sh, mapred-env.sh, yarn-env.sh

Open the file hadoop-env.sh, mapred-env.sh, yarn-env.sh, modified JAVA_HOMEpath of FIG.JAVA_HOME=/opt/modules/jdk1.8.0_171

[hadoop@host1 ~]$ vi /opt/modules/app/hadoop-2.10.1/etc/hadoop/hadoop-env.sh

Insert picture description here

[hadoop@host1 ~]$ vi /opt/modules/app/hadoop-2.10.1/etc/hadoop/mapred-env.sh

Insert picture description here

[hadoop@host1 ~]$ vi /opt/modules/app/hadoop-2.10.1/etc/hadoop/yarn-env.sh

Insert picture description here

3.4 Deployment core-site.xml

[hadoop@host1 ~]$ cd /opt/modules/app/hadoop-2.10.1
[hadoop@host1 hadoop-2.10.1]$ vi etc/hadoop/core-site.xml

Add the following content in <configuration> and </configuration>

 <property>
   <name>fs.defaultFS</name>
   <value>hdfs://host1.chybinmy.com:8020</value>
 </property>
 <property>
   <name>hadoop.tmp.dir</name>
   <value>/opt/modules/app/hadoop-2.10.1/data/tmp</value>
 </property>

explain:

  • fs.defaultFSIs the address of the NameNode.
  • hadoop.tmp.dirIt is the address of the hadoop temporary directory. By default, the data files of NameNode and DataNode will be stored in the corresponding subdirectories of this directory.

3.5 Configure hdfs-site.xml

[hadoop@host1 hadoop-2.10.1]$ vi etc/hadoop/hdfs-site.xml

Add the following content in <configuration> and </configuration>

 <property>
   <name>dfs.namenode.secondary.http-address</name>
   <value>host3.chybinmy.com:50090</value>
 </property>
 <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/opt/modules/app/hadoop-2.10.1/data/tmp/dfs/name</value>
</property>
<property>
     <name>dfs.datanode.data.dir</name>
     <value>file:/opt/modules/app/hadoop-2.10.1/data/tmp/dfs/data</value>
</property>

explain:

  • dfs.namenode.secondary.http-addressIt is the designated secondaryNameNodehttp access address and port number, because in the planning, we will host3plan as a SecondaryNameNodeserver.
    So here is set to:host3.chybinmy.com:50090

3.6 Configure slaves

[hadoop@host1 hadoop-2.10.1]$ vi etc/hadoop/slaves

Add
host1.chybinmy.com
host2.chybinmy.com
host3.chybinmy.com to the file

The slaves file specifies HDFSwhich DataNodenodes are on.

3.7 Deployment yarn-site.xml

[hadoop@host1 hadoop-2.10.1]$ vi etc/hadoop/yarn-site.xml

Add the following content in <configuration> and </configuration>

   <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>host2.chybinmy.com</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>106800</value>
    </property>

explain:

  • According to the plan, yarn.resourcemanager.hostnamethe designated resourcemanagerserver points to it host2.chybinmy.com.
  • yarn.log-aggregation-enableIs to configure whether to enable the log aggregation function.
  • yarn.log-aggregation.retain-secondsIt is to configure how long the aggregated logs are kept on HDFS.

3.8 Deployment mapred-site.xml

From mapred-site.xml.templatecopying a mapred-site.xmlfile.

[hadoop@host1 hadoop-2.10.1]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
[hadoop@host1 hadoop-2.10.1]$ vi etc/hadoop/mapred-site.xml

Add the following content in <configuration> and </configuration>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>192.168.159.159:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>192.168.159.159:19888</value>
    </property>

explain:

  • mapreduce.framework.nameSet the mapreducetask to run yarnon.
  • mapreduce.jobhistory.addressThe set mapreducehistory server is installed on the host1 machine.
  • mapreduce.jobhistory.webapp.addressIt is to set the web page address and port number of the history server.

4 Set up SSH login without password

Each machine in the Hadoop cluster will access each other through SSH. It is unrealistic to enter a password for each access, so it is necessary to configure the SSH between each machine to log in without a password.

4.1 Generate public key

First convert the user to root

su root

Insert picture description here

vi /etc/ssh/sshd_config

I found the following three sentences and removed the comments. I didn’t find them RSAAuthentication yes, so I added them directly.

RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile        .ssh/authorized_keys

Insert picture description here
Restart the service, return to the hadoop user, generate public and private keys

service sshd restart
su hadoop
ssh-keygen -t dsa

Insert picture description here
Will /home/hadoop/the next generation .sshfile (the file is a hidden file, you need ll -ato see), in .sshnext time to see the two documents on behalf of public and private keys
to copy the public key to the authorized_keysfile, and change the authorized_keysoperating authority file, then ssh This machine, if you don’t need to enter a password, it means you can pass through this machine
.

cd .ssh
cat id_dsa.pub >> authorized_keys(或者cat id_rsa.pub >> authorized_keys)
chmod 600 authorized_keys
ssh localhost

Insert picture description here
If you ssh localhoststill need to enter the password, try the following sentence, which /home/hadoop/can be flexibly changed according to your host username

chmod 755 /home/hadoop/
chmod 700 ~/.ssh
chmod 644 ~/.ssh/authorized_keys

The above operations are performed on host1, host2 and host3 at the same time! After the operation, go to the next step.

4.2 Distribution of public keys

The id_dsa.pub of the following statement may be id_rsa.pub depending on the situation

cat ~/.ssh/id_dsa.pub | ssh hadoop@host2.chybinmy.com 'cat - >> ~/.ssh/authorized_keys'
cat ~/.ssh/id_dsa.pub | ssh hadoop@host3.chybinmy.com 'cat - >> ~/.ssh/authorized_keys'

Also do the key distribution operation in host2, host3

Verify that the SSH passwordless login is successful, directlyssh 主机名
Insert picture description here

5 Distribute Hadoop files

First create a directory to store Hadoop on the other two machines

[hadoop@host2 ~]$ mkdir /opt/modules/app
[hadoop@host3 ~]$ mkdir /opt/modules/app


The share/doc directory under the Hadoop root directory distributed by Scp is the stored Hadoop documents. The files are quite large. It is recommended to delete this directory before distributing, which can save hard disk space and increase the speed of distribution.

[hadoop@host1 hadoop-2.5.0]$ du -sh /opt/modules/app/hadoop-2.10.1/share/doc
[hadoop@host1 hadoop-2.5.0]$ scp -r /opt/modules/app/hadoop-2.10.1/ 192.168.159.158:/opt/modules/app
[hadoop@host1 hadoop-2.5.0]$ scp -r /opt/modules/app/hadoop-2.10.1/192.168.159.157:/opt/modules/app

6 Format NameNode

Perform formatting on the NameNode machine (host1):

[hadoop@host1 hadoop-2.10.1]$ /opt/modules/app/hadoop-2.10.1/bin/hdfs namenode -format

Note:
If you need to reformat the NameNode, you need to delete the two folders in the figure below for each host (the datafolder is core-site.xmlconfigured in hadoop.tmp.dir)
It turns out that all the files under the NameNode and DataNode are deleted, otherwise an error will be reported.

Because every time formatting, the default is to create a cluster ID and write it into the file of NameNodesum (the directory where the VERSION file is located is sum ). When reformatting, a new cluster ID will be generated by default. If the original directory is not deleted, It will cause the new cluster ID in the VERSION file, and the old cluster ID in the middle , and an error will be reported if it is inconsistent.DataNodeVERSIONdfs/name/currentdfs/data/currentnamenodeDataNode

Another method is to specify the cluster ID parameter when formatting, and specify it as the old cluster ID.

7 Start the cluster

7.1 Start HDFS

[hadoop@host1 ~]$ /opt/modules/app/hadoop-2.10.1/sbin/start-dfs.sh
[hadoop@host1 ~]$ jps

For the three hosts respectively jps, the following three pictures show that the startup is successful.
Insert picture description here
Insert picture description here
Insert picture description here
To shut down HDFS, use the following statement

[hadoop@host1 ~]$ /opt/modules/app/hadoop-2.10.1/sbin/stop-dfs.sh

7.2 Start YARN

In the host1start on YARN

[hadoop@host1 ~]$ /opt/modules/app/hadoop-2.10.1/sbin/start-yarn.sh

In the host2start ResourceManager on:

[hadoop@host2 hadoop-2.10.1]$ /opt/modules/app/hadoop-2.10.1/sbin/yarn-daemon.sh start resourcemanager
[hadoop@host2 hadoop-2.10.1]$ jps

Insert picture description here

7.3 Start the log server

Because we plan host3to run the MapReduce log service on the server, we need to start it on host3

[hadoop@host3 hadoop-2.10.1]$ /opt/modules/app/hadoop-2.10.1/sbin/mr-jobhistory-daemon.sh start historyserver

Insert picture description here

7.4 View HDFS web page

http://192.168.159.159:50070/(The URL is changed according to your own ip, where ip is the IP of host1)
Insert picture description here

7.5 View YARN Web page

http://192.168.159.158:8088/cluster(The URL is changed according to your own ip, where ip is the IP of host2)
Insert picture description here

8 Test Job

Here we use the wordcount example that comes with hadoop to test and run mapreduce in local mode.

8.1 Prepare mapreduce input file wc.input

[hadoop@host1 ~]$ cat /opt/data/wc.input

Insert picture description here

8.2 Create an input directory input in HDFS

[hadoop@host1 ~]$ cd /opt/modules/app/hadoop-2.10.1
[hadoop@host1 hadoop-2.10.1]$ bin/hdfs dfs -mkdir /input

Upload wc.input to HDFS

[hadoop@host1 hadoop-2.10.1]$ bin/hdfs dfs -put /opt/data/wc.input /input/wc.input

8.3 Run the mapreduce Demo that comes with hadoop

[hadoop@host1 hadoop-2.10.1]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.1.jar wordcount /input/wc.input /output

Insert picture description here

8.4 View output files

[hadoop@host1 hadoop-2.10.1]$ bin/hdfs dfs -ls /output

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_42946328/article/details/113496798