The hadoop configuration is changed in 3 places: change the host name, change the IP, change the MAC
and rename the host name:
hostnamectl set-hostname hadoop1
Install nano command:
yum install nano
Gateway configuration:
Order:
nano /etc/sysconfig/network-scripts/ifcfg-ens33 #ens33网卡名,需要修改BOOTPROTO,IPADDR,添加NETMASK,DNS1
Modified content:
BOOTPROTO=static #静态
IPADDR=IP地址 #ip地址
NETMASK=255.255.255.0 #子网掩码
DNS1=192.168.91.2 #dns解析,设置为和网关一样
View the gateway configuration: cat /etc/sysconfig/network-scripts/ifcfg-ens33
Save: Ctrl+O-->Enter
Exit: Ctrl+X
3. Change the MAC address: sed -i '/UUID=/c\UUID='`uuidgen`'' /etc/sysconfig/network-scripts/ifcfg-ens33
Update the virtual machine kernel: yum -y update
yum -y upgrade
Password- free login:
configure key: ssh-keygen -t rsa
Enter: cd /root/.ssh #(. indicates hidden files)
View all: ll -a
Key-free login: ssh root@hadoop2
Configure ip and hostname mapping: After configuration, you can connect by hostname, such as ssh root@hadoop
View a file: cat /etc/hosts
Edit the file using the nano tool: nano /etc/hosts
Add the following:
192.168.91.129 hadoop1
192.168.91.130 hadoop2
192.168.91.131 hadoop3
Install the time synchronization tool:
yum install chrony -y #-y全自动安装
查看文件:rpm -qa | grep chrony
启动服务:systemctl start chronyd
查看服务:systemctl status chronyd
开机启动服务:systemctl enable chronyd --now
Firewall configuration:
查看状态:systemctl status firewalld
关闭防火墙:systemctl stop firewalld
禁止防火墙开机启动:systemctl disable firewalld
Configure time synchronization:
Edit time synchronization: nano /etc/chrony.conf
Comment out server0,1,2,3,
add time synchronization server: server Hadoop1 iburst #The other two start one synchronously
Restart chronyd: systemctl restart chronyd
View time synchronization status: chronyc sources -v
configure jdk
Configure environment variables: /etc/profile Use the command nano /etc/profile
export JAVA_HOME=~/export/servers/jdk1.8.0_202
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Execute the "source /etc/profile" command to initialize the system environment variables to make the configuration take effect.
#copy environment variables
scp /etc/profile root@hadoop2:/etc
#other copy jdk
scp -r export/servers/ root@hadoop2:export/ #-r表示递归,可以拷贝多层文件夹,把servers拷贝到export下
#Make the environment variable take effect
source /etc/profile
Configure zookeeper
Copy zookeeper to /export/servers/
Enter the conf directory under the ZooKeeper installation directory.
Copy the template file zoo_sample.cfg to the zoo.cfg configuration file
cp zoo_sample.cfg zoo.cfg edit the content of
zoo.cfg
vi zoo.cfg dataDir=/export/data/zookeeper/zkdata #Remember To create this folder /export/data/zookeeper/zkdata server.1=spark1.2888:3888 server.2=spark2.2888:3888 server.3=spark3.2888:3888 pwd shows the current path in the zkdata directory of hadoop1 Execute echo 1 > myid in the zkdata directory of hadoop2 Execute echo 2 > myid in the zkdata directory of hadoop3 Execute echo 3 > myid
Configure the zookeeper environment variable
export ZK_HOME=/export/se
rvers/zookeeper-3.4.10#Note the zookeeper version
export PATH=$PATH:$ZK_HOME/bin
Then copy zookeeper and profile to 2,3 hosts
zkServer.sh status#View the status of zk
zkServer.sh start#Start zk
ps#View the process
jps#View the process related to java
Install Hadoop:
Copy the hadoop package to /export/software/
Use the command to extract to /export/servers/
tar -zxvf /export/software/hadoop-2.7.4.tar.gz -C /export/servers/
configure sh command
Enter the /etc/hadoop/ directory of the Hadoop installation package. The following commands are all operations in this directory. Edit the hadoop-env.sh file
vi hadoop-env.sh#可以用nano命令
Change the default JAVA_HOME parameter in the file to the path where the JDK is installed locally.
export JAVA_HOME=/export/servers/jdk1.8.0
Enter the /etc/hadoop/ directory of the Hadoop installation package and edit the yarn-env.sh file
vi yarn-env.sh
Change the default JAVA_HOME parameter in the file to the path where the JDK is installed locally, same as the previous step
Command to edit Hadoop's core configuration file core-site.xml
vi core-site.xml
Modify the following address
<property>
<name>fs.defaultFS</name>
<value>hdfs://master</value><!--master集群名字-->
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/export/servers/hadoop-2.7.4/tmp</value><!--tmp临时文件目录,不存在需要手动创建-->
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>spark01:2181,spark02:2181,spark03:2181</value>
</property>
Command to edit the core configuration file hdfs-site.xml of HDFS
vi hdfs-site.xml
The modifications are as follows:
<property>
<name>dfs.replication</name>
<value>3</value><!--集群数-->
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/export/data/hadoop/namenode</value><!--没有路径需要手动创建-->
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/export/data/hadoop/datanode</value><!--没有路径需要手动创建-->
</property>
<!---->
<property>
<name>dfs.nameservices</name>
<value>master</value>
</property>
<property>
<name>dfs.ha.namenodes.master</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.master.nn1</name>
<value>spark01:9000</value>
</property>
<!---->
<property>
<name>dfs.namenode.rpc-address.master.nn2</name>
<value>spark02:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.master.nn1</name>
<value>spark01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.master.nn2</name>
<value>spark02:50070</value>
</property>
<!---->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://spark01:8485;spark02:8485;spark03:8485/master</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/export/data/hadoop/journaldata</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.master</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!---->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!---->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value><!--超时-->
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
Enter the /etc/hadoop/ directory of the Hadoop installation package, execute the command, and create
the core configuration file mapred-site.xml of MapReduce by copying the template file
cp mapred-site.xml.template mapred-site.xml
Run the command to edit the configuration file mapred-site.xml and specify the MapReduce runtime framework.
vi mapred-site.xml
amend as below:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value><!--资源调度器-->
</property>
Execute the command to edit the core configuration file yarn-site.xml of YARN
cp yarn-site.xml
amend as below:
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarncluster</value>
</property>
<!---->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>spark01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>spark02</value>
</property>
<!---->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>spark01:2181,spark02:2181,spark03:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!---->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!---->
Modify the slaves file
Execute the command to edit the file slaves that records the host names of all DataNode nodes and NodeManager nodes in the Hadoop cluster
vi slaves
The content is as follows:
spark01
spark02
spark03
Configure Hadoop environment variables
Run the command to edit the system environment variable file profile and configure the Hadoop system environment variables
vi /etc/profile
Add the following content:
export HADOOP_HOME=/export/servers/hadoop-2.7.4
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
Execute the command to initialize the system environment variables to make the configuration take effect
source /etc/profile
Hadoop installation directory and system environment variable files are distributed to 2 other virtual machines
#将Hadoop安装目录分发到虚拟机Spark02和Spark03
$ scp -r /export/servers/hadoop-2.7.4/ root@spark02:/export/servers/
$ scp -r /export/servers/hadoop-2.7.4/ root@spark03:/export/servers/
#将系统环境变量文件分发到虚拟机Spark02和Spark03
$ scp /etc/profile root@spark02:/etc/
$ scp /etc/profile root@spark03:/etc/
Execute the command to view the Hadoop version of the current system environment
hadoop version
Start the Hadoop service, a total of 5 steps
1. Start ZooKeeper
and execute it in the virtual machines Spark01, Spark02 and Spark03 respectively
zkServer.sh start
The command starts the ZooKeeper service for each virtual machine.
2. Start JournalNode Run the following commands
in virtual machines Spark01, Spark02, and Spark03 to start the JournalNode service of each virtual machine.
hadoop-daemon.sh start journalnode
Note that the command is only executed on the first boot:
Initialize NameNode (only for initial startup)
Run the "hdfs namenode -format" command on the virtual machine Spark01 of the master node of the Hadoop cluster to initialize the NameNode operation.
Initialize ZooKeeper (only for initial startup)
On the NameNode master node virtual machine Spark01, execute the "hdfs zkfc -formatZK" command to initialize the HA state in ZooKeeper.
NameNode synchronization (only for initial startup execution)
After the NameNode master node in the virtual machine Spark01 executes the initialization command, the contents of the metadata directory need to be copied to other unformatted NameNode standby nodes (virtual machine Spark02) to ensure that the master node and the The NameNode data of the standby node is consistent
scp -r /export/data/hadoop/namenode/ root@spark02:/export/data/hadoop/
3. Start HDFS
In the virtual machine Spark01 , execute the one-click startup script command to start the HDFS of the Hadoop cluster. At this time, the NameNode and ZKFC on the virtual machines Spark01 and Spark02 and the DataNodes on the virtual machines Spark01, Spark02 and Spark03 will be started. The following command:
start-dfs.sh
4. Start YARN Start the YARN of the Hadoop cluster by executing the one-click startup script command
in the virtual machine Spark01 , as follows:
start-yarn.sh
At this point, the ResourceManager on the virtual machine Spark01 and the NodeManagers on the virtual machines Spark01, Spark02, and Spark03 will be started.
5. Start ResourceManager
The ResourceManager standby node on the virtual machine Spark02 needs to be started separately on the virtual machine Spark02. Run the following command:
yarn-daemon.sh start resourcemanager
jps: command to check whether Hadoop high-availability cluster related processes are successfully started
Full node start zk
zkServer.sh start
The master node starts dfs
start-dfs.sh
The master node starts yarn
start-yarn.sh
Child node 1 starts
yarn-daemon.sh start resourcemanager
Full service shutdown
stop-all.sh
Full service start
start-all.sh
View data
#浏览器查看
ip:50070
ip:8088
Access by hostname, open the following file
C:\Windows\System32\drivers\etc\hosts
Add the following configuration
192.168.8.134 hadoop1
192.168.8.135 hadoop2
192.168.8.136 hadoop3
Operations in HDFS:
Delete folder:
hadoop fs -rm -r /文件夹名称
Create the folder:
hadoop fs -mkdir /文件夹名称
Check out the catalog:
hadoop fs -ls /
Modify permissions HDFS operation permissions:
hadoop fs -chmod 777 /input
To leave Safe Mode manually:
//若配置环境变量,使用以下命令
hadoop dfsadmin -safemode leave
Exception handling:
Appears as follows:
Failed to retrieve data from /webhdfs/v1/data/clickLog/2022_04_24?op=LISTSTATUS:
change google browser