Hadoop fully distributed cluster construction (super detailed) - big data cluster construction

Hadoop fully distributed cluster construction

The environment used to build a fully distributed cluster this time is:
jdk1.8.0
hadoop-2.7.7
The environment required to build a cluster this time is also prepared for you. The download link address: https://share.weiyun.com/dk7WgaVk
Password :553ubk

This fully distributed cluster construction needs to establish three virtual machines in advance, and I named their host names as: master, slave1, slave2

1. Configure password-free login
First, we need to realize mutual ssh password-free login between the three virtual machines and
operate on the master virtual machine:

1. Create an ssh secret key, enter the following command to generate a public and private key, and press Enter in the three red boxes below

ssh-keygen -t rsa

insert image description here
2. Copy the public key to this machine to realize password-free login, enter yes in the first red box position, and enter the login password of your own virtual machine in the second red input position

ssh-copy-id master

insert image description here
Same as the above operation, we implement the secret-free login of the master to slave1 and slave2 in the master

ssh-copy-id slave1

ssh-copy-id slave2

In the same way, we enter the slave1 and slave2 virtual machines respectively, and perform the same operations above

ssh-copy-id master

ssh-copy-id slave1

ssh-copy-id slave2

The three virtual machines need to perform 3*3=9 password-free login operations in total

Then we make changes to the /etc/hosts file on the master virtual machine and set up IP mapping

vi /etc/hosts

insert image description here
We add the ip addresses and host names of the three virtual machines as shown in the figure, and type: wq to save and exit ]
Similarly, we switch to the slave1 and slave2 virtual machines to make the same modification to the /etc/hosts file. save and exit

2. Configure the java environment

1. We configure the java environment on the master host.
insert image description here
First, decompress the jdk compressed file in the virtual machine environment to the /opt directory.

tar -zxvf  ~/ruanjian/jdk1.8.0_221.tar.gz  -C  /opt

Enter the /opt directory and rename the decompressed file to java

cd /opt
mv jdk1.8.0_221 java

Next modify the environment variable

vi /etc/profile

Enter the file Press GG to enter the last line of the configuration file, add the following information:

export JAVA_HOME=/opt/java  #你java路径是啥这里就填啥
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

:wq save configuration information
insert image description here

Then we take effect configuration information

source /etc/profile

Finally, verify whether the java environment is configured successfully

java -version

As shown in the figure, the java version number is correctly displayed, and the configuration is successful~
insert image description here
Then we distribute the java environment to slave1 and slave2 virtual machines through commands

scp /etc/profile root@slave1:/etc/profile
scp /etc/profile root@slave2:/etc/profile

insert image description here
Make effective configurations in slave1 and slave2 virtual machines respectively

ssh slave1
source /etc/profile
exit
ssh slave2
source /etc/profile
exit

insert image description here

3. Hadoop installation

1. Decompress the hadoop-2.7.7.tar.gz compressed package under /root/runajian to the /opt directory, and rename the decompressed file to hadoop

tar -zvxf /root/ruanjian/hadoop-2.7.7.tar.gz -C /opt
mv hadoop-2.7.7 hadoop

insert image description here

2. Modify environment variables

vim /etc/profile

3. Press Shift+g to navigate to the last line, press i to switch to input mode, and add the following code:

export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

insert image description here

4. Press ESC, press :wq to save and exit

5. The configuration just now takes effect

source /etc/profile

6. Similarly, distribute the configuration file of the master to slave1 and slave2

ssh slave1
source /etc/profile
exit
ssh slave2
source /etc/profile
exit

insert image description here
4. Hadoop configuration
is in /opt/hadoop/etc/hadoop, there are six files that need to be configured, namely:
hadoop-env.sh
core-site.xml
yarn-site.xml
hdfs-site.xml
mapred-site .xml
slaves
1. First, operate the hadoop-env.sh file in the master virtual machine

vi /opt/hadoop/etc/hadoop/hadoop-env.sh

In the configuration file, enter 25gg to locate line 25, modify the java environment path configured for yourself, type: wq to save and exit
insert image description here
2. Modify the core-site.xml file

vi /opt/hadoop/etc/hadoop/core-site.xml

Enter the configuration file, type G to locate the last line, and enter the i command in the configuration tag to add the following code:

<configuration>
 <property>
     <name>fs.defaultFS</name>
     <value>hdfs://master:9000</value>
 </property>
 <property>
     <name>hadoop.tmp.dir</name>
     <value>/opt/hadoop/tmp</value>
 </property>
</configuration>

insert image description here
3. Modify the hdfs-site.xml file

vi /opt/hadoop/etc/hadoop/hdfs-site.xml

Enter the configuration file, type G to locate the last line, and enter the i command in the configuration tag to add the following code:

<configuration>
 <property>
      <name>dfs.namenode.http-address</name>
      <value>master:50070</value>
 </property>
 <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>master:50090</value>
 </property>
 <property>
      <name>dfs.replication</name>
      <value>3</value>
 </property>
 <property>
      <name>dfs.namenode.name.dir</name>
      <value>/opt/hadoop/tmp/dfs/name</value>
 </property>
 <property>
      <name>dfs.datanode.data.dir</name>
      <value>/opt/hadoop/tmp/dfs/data</value>
 </property>
</configuration>

insert image description here
4. Modify the yarn-site.xml file

vi /opt/hadoop/etc/hadoop/yarn-site.xml

Enter the configuration file, type G to locate the last line, and enter the i command in the configuration tag to add the following code:

<configuration>
 <property>
    <name>yarn.resourcemanager.hostsname</name>
    <value>master</value>
 </property>
 <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>master:8088</value>
 </property>
 <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
 </property>
 <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
 <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
 </property>
 <property>
    <name>yarn.log-aggregation.retain-seconds</name>
    <value>106800</value>
 </property>
 <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/user/container/logs</value>
 </property>
</configuration>

insert image description here
5. In the configuration folder, copy the template file mapred-site.xml.template as mapred-site.xml

cp /opt/haoop/etc/hadoop/mapred-site.xml.template  /opt/haoop/etc/hadoop/mapred-site.xml

Modify the mapred-site.xml file

vi /opt/hadoop/etc/hadoop/mapred-site.xml

Enter the configuration file, type G to locate the last line, and enter the i command in the configuration tag to add the following code:

<configuration>
 <property>
     <name>mapreduce.framework.name</name>
     <value>yarn</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.address</name>
     <value>master:10020</value>
 </property>
 <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>master:19888</value>
 </property>
</configuration>

insert image description here
6. Modify the slaves file
vi /opt/hadoop/etc/hadoop/slaves
to enter the configuration file and press dd to delete the first line, then type i to enter the input mode, enter slave1, slave2

7. Distribute the configured hadoop file to slave1 , under the /opt folder in the slave2 virtual machine

scp /opt/hadoop root@slave1:/opt/
scp /opt/hadoop root@slave2:/opt/

8. Format the namenode in the master virtual machine

hdfs namenode -format

9. Open the cluster after formatting

start-all.sh

In the master, slave1, and slave2 virtual machines, we query whether the cluster is started successfully through the jps command

jps

Master interface:
insert image description here
slave1 interface:

slave2 interface:

After the cluster is successfully started, the nodes will be displayed as shown in the figure:
Master: NameNode, ResourceManager, SecondaryNameNode
slave1: DataNode, NodeManager
slave2: DataNode, NodeManager

This is the end of the hadoop fully distributed cluster building. If this article is helpful or likes you, remember to like it and follow it~