1. Hadoop cluster deployment (version 2.9.2)

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right

Article directory


Nanny level installation tutorial series - Hadoop cluster installation (version 2.9.2)

1. Install version

jdk1.8.0_131、hadoop-2.9.2、CentOS-7-x86_64-DVD-2009.iso

2. Create 3 virtual machines

1 master node: master
2 slave nodes: slave1, slave2

3. Configure the network

1. Configuration file

vim /etc/sysconfig/network-scripts/ifcfg-ens33

** Do not set exactly the same as mine! ! ! **
** Do not set exactly the same as mine! ! ! **
** Do not set exactly the same as mine! ! ! **

Notes:
① IPADDR is the IP address
② GATEWAY is the gateway

2. Configure the VMware Network Adapter VMnet8 network

This step needs to be operated on a Windows system! ! !
Network Configuration Management - VNnet8 - Protocol 4 - Properties
** Do not set exactly the same as mine! ! ! **
** Do not set exactly the same as mine! ! ! **
** Do not set exactly the same as mine! ! ! **
Please add a picture description
Note: The network segment and gateway in the physical machine must be the same as those in the virtual machine, and the IP addresses (the last three digits) cannot conflict with each other.

3. Reset network

service network restart

4. Open the windows terminal (Windows+R) to check whether the external network can be pinged

Please add a picture description

5. Check whether Centos can ping www.baidu.com

insert image description here

Note:
① If the network still cannot be pinged after completing the above operations, consider checking the virtual machine network compiler to check whether the NAT network segment is the same as the configuration.
② Perform the same operation on the 3 machines, configure the IP addresses of the same network segment (the last 3 digits are different) and the same gateway.

Fourth, modify the host name

1. Modify the hostname

vim /etc/hostname

insert image description here

2. Reboot

reboot

3. Check again if the hostname has changed

hostname

insert image description here
Note: The host names here correspond to 3 different machines: master, slave1, and slave2. The host names of all three machines must be changed! Otherwise, subsequent mapping cannot be completed!

5. Mapping

1. Set up the mapping

vim /etc/hosts

Add IP addresses and hostnames of 3 machines
insert image description here
Note: Do the same for 3 machines

2. Check whether the 3 machines can ping each other

insert image description here
Note: Only a picture of the master pinging slave1 is posted here. Be sure to test that each machine can ping the other two, and ping itself.

6. SSH password-free login

1. Check if ssh is installed

rpm -qa | grep ssh

insert image description here
As shown above: If it is installed, it can be used directly; if it is not installed, use the command:

yum -y install openssh
yum -y install openssh-server
yum -y install openssh-clients

2. Configure ssh configuration file

 vim /etc/ssh/sshd_config

① Modify the content of line 43: remove the symbol #
② Add above it: RSAAuthentication yes
insert image description here

4. Restart the sshd service

systemctl restart sshd.service

5. Use the command ssh-keygen to generate the public key and private key (press Enter 3 times in the middle)

6. Copy the public key to the key file

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_key

7. Modify key file permissions

chmod 0600 ~/.ssh/authorized_keys

8. Shared public key

ssh-copy-id -i ~/.ssh/id_rsa.pub root@slave1
# 其中需要输入的密码是:root用户密码

9. After each machine executes the shared public key command for each user, check whether they can log in to each other.

insert image description here
Note: Only a picture of the master password-free login to slave1 is posted here. Be sure to test that each machine can log in with the other two without password, and log in with itself without password.

7. Install Java

1. Create two directories to store the software installation compressed package and the decompressed package respectively.

mkdir /opt/module put the compressed package
mkdir /opt/software put the decompressed software package

2. Install the rz command: yum install lrzsz

3. Switch to the directory /opt/module and upload the compressed package in this directory: cd /opt/module

4. Enter the command: rz (select the compressed package to upload, click ADD and confirm)

5. Unzip the package

tar -zxvf jdk-linux-x64.tar.gz -C /opt/software/
tar -zxvf hadoop-2.9.2.tar.gz -C /opt/software/

6. Configuration file: vim /etc/profile

Add in the last line of the file:

export JAVA_HOME=/opt/software/jdk1.8.0_131
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

7. Restart the configuration file: source /etc/profile

8. Verify that java is installed: java -version

insert image description here

Eight, install hadoop

1. Configuration file:

① add vim /etc/profile
in the last line:

export HADOOP_HOME=/opt/software/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Then restart the file: source /etc/profile

② vim /etc/profile.d/hadoop.sh
This is a new file, add content:

export HADOOP_HOME=/opt/software/Hadoop-2.9.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

Then restart the file: source /etc/profile.d/hadoop.sh

The remaining configuration files need to switch to this path: cd /opt/software/Hadoop-2.9.2/etc/hadoop <\font>

③ vim hadoop-env.sh

#在第25行修改
export JAVA_HOME=/opt/software/jdk1.8.0_131
#在第26行添加
export HADOOP_SSH_OPTS='-o StrictHostKeyChecking=no'
#在第113行修改
export HADOOP_PID_DIR=${
    
    HADOOP_PID_DIR}/pids

④ vim mapred-env.sh

# 在第16行修改
export JAVA_HOME=/opt/software/jdk1.8.0_131
# 在第28行修改
export HADOOP_MAPRED_PID_DIR=${
    
    HADOOP_HOME}/pids

⑤ vim yarn-env.sh

# 在第23行修改
export JAVA_HOME=/opt/software/jdk1.8.0_131
# 在最后一行添加
export YARN_PID_DIR=${
    
    HADOOP_HOME}/pids

⑥ vim core-site.xml

# 输入:
<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://192.168.148.170:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/opt/software/hadoop-2.9.2/hdfsdata</value>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
</configuration>

⑦ vim hdfs-site.xml

# 输入 :
<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>file:/opt/software/hadoop-2.9.2/hdfsdata/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>file:/opt/software/hadoop-2.9.2/hdfsdata/dfs/data</value>
	</property>
	<property>
		<name>dfs.namenode.checkpoint.dir</name>
	   <value>file:/opt/software/hadoop-2.9.2/hdfsdata/dfs/namesecondary</value>
	</property>
	<property>
		<name>dfs.blocksize</name>
		<value>134217728</value>
	</property>
	<property>
        <name>dfs.replication</name>
        <value>3</value>
 </property>
	<property>
		<name>dfs.namenode.http-address</name>
		<value>0.0.0.0:50070</value>
	</property>
</configuration>

⑧ vim mapred-site.xml
Copy the file mapred-site.xml.template and name it mapred-site.xml:
cp mapred-site.xml.template mapred-site.xml

# 输入:
<configuration>
<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>master:19888</value>
	</property>
	<property>
		<name>mapreduce.job.maps</name>
		<value>2</value>
	</property>
	<property>
		<name>mapreduce.job.reduces</name>
		<value>1</value>
	</property>
</configuration>
	vim yarn-site.xml
输入:
<configuration>
<!-- Site specific YARN configuration properties -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
	</property>
	<property>
        <name>yarn.resourcemanager.scheduler.class</name>
       <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
	</property>
	<property>
		<name>yarn.resourcemanager.webapp.address</name>
		<value>${
    
    yarn.resourcemanager.hostname}:8088</value>
	</property>
	<property>
		<name>yarn.nodemanager.local-dirs</name>
		<value>${
    
    hadoop.tmp.dir}/nm-local-dir</value>
	</property>
</configuration>

⑨ vim slaves

# 删除localhost
# 输入:
slave1
slave2

2. Copy all the files after configuration to the other two machines:

scp /opt/software/hadoop-2.9.2/etc/hadoop/* root@slave1:/opt/software/hadoop-2.9.2/etc/hadoop/
scp /opt/software/hadoop-2.9.2/etc/hadoop/* root@slave2:/opt/software/hadoop-2.9.2/etc/hadoop/

3. Turn off the firewall (permanently off): systemctl disable firewalld.service

4. Reboot: reboot

5. Check whether the firewall is closed: systemctl status firewalld.service

insert image description here

9. Start Hadoop

1. Format the file system: hdfs namenode -format

2. Start Hadoop:

① start-dfs.sh
insert image description here
② start-yarn.sh
insert image description here
③ mr-jobhistory-daemon.sh start historyserver
insert image description here

3. Whether the verification is successful: jps

① Master node master:
insert image description here
② Slave nodes slave1, slave2:
insert image description here
insert image description here

4. Open the browser to see if you can access the webpage

Note: The link IP is the IP set by yourself
insert image description here
insert image description here

insert image description here

Guess you like

Origin blog.csdn.net/weixin_45536765/article/details/123396495