[Dark Horse 2023 Big Data Practical Tutorial] VMWare Virtual Machine Deployment HDFS Cluster Detailed Process

Video: Dark Horse 2023 VMWare virtual machine deploys HDFS cluster
Attention! The premise of these operations is to complete the server creation, fixed IP, firewall shutdown, Hadoop user creation, SSH password-free, JDK deployment and other operations in the pre-preparation!!!

The operation is recorded here in the preparation process of the big data cluster environment (3 virtual machines)

Deploy HDFS cluster

1. Upload the Hadoop installation package to node1
rz -bey
2. Unzip the installation package to /export/server/
tar -zxvf hadoop-3.3.4.tar.gz -C /export/server
3 Build a soft link
cd / export/server
ln -s /export/server/hadoop-3.3.4 hadoop
4. Enter the hadoop installation package
cd hadoop

ls -l to view the internal structure of the folder:
the meaning of each folder is as follows:

bin, store various programs of Hadoop (command etc, store Hadoop configuration file
include, some header files lib of C language, store dynamic link library (.so file) libexec of Linux system, store script file
(. sh and .cmd licenses-binary, store license files sbin, administrator program (super bin)
share, store binary source code (Javajar package)


The main configuration is
some files in etc :!!! cd /export/server/hadoop/etc/hadoop

It should be noted here that I wrote all the configuration in the /export/server/hadoop file at the beginning, but the node could not be started when it was finally started. After a long time of investigation>_<!!

1. Configure workers:

vim workers
first delete the built-in localhost
input:

node1
node2
node3

2. Configure hadoop-env.sh file

vim hadoop-env.sh
fill in the following content

export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs

JAVA HOME, indicating the location of the JDK environment
HADOOP_HOME, indicating the location of the Hadoop installation
HADOOP_CONF_DIR, indicating the location of the Hadoop configuration file directory
HADOOP_LOG DIR, indicating the location of the Hadoop operation log directory
By recording these environment variables, to indicate the important information of the above runtime

3. Configure the core-site.xml file

vim core-site.xml
Fill in the following content inside the file

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://node1:8020</value>
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
</configuration>

key: fs.defaultFS meaning: HDFS file system network communication path value: hdfs://nodel:8020
protocol is hdfs:// namenode is node1 namenode communication port is 8020 key:
jo.file.buffer.size meaning: io operation File buffer size Value: 131072bit

hdfs://node1:8020 is the internal communication address of the entire HDFS, and the application protocol is hdfs:// (Hadoop built-in protocol),
indicating that DataNode will communicate with port 8020 of node1, and node1 is the machine where NameNode is located.
This configuration fixes node1 and must start NameNode process

4. Configure the hdfs-site.xml file

Fill in the following content inside the file

<configuration>
	<property>
		<name>dfs.datanode.data.dir.perm</name>
		<value>700</value>
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/data/nn</value>
	</property>
	<property>
		<name>dfs.namenode.hosts</name>
		<value>node1,node2,node3</value>
	</property>

	<property>
		<name>dfs.blocksize</name>
		<value>268435456</value>
	</property>
	<property>
		<name>dfs.namenode.handler.count</name>
		<value>100</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/data/dn</value>
	</property>
</configuration>


Here I have a situation where HDFS deployment is successful but the web page cannot be opened. It is solved by displaying the specified port
, that is, adding:

dfs.namenode.http-address node1:9870

The detailed problem is recorded in this blog: How to solve the problem that the HDFS cluster is successfully deployed but the webpage cannot be opened .

Prepare the data directory

Namenode data is stored in node1's /data/nn
datanode data is stored in node1, node2, node3's /data/dn, so it should be
in node1 node:
mkdir -p /data/nn
mkdir /data/dn
in node2 and node3 nodes
mkdir -p /data /dn

Distribute the Hadoop folder

distribution

Execute the following command cd /export/server on node1
(or the current step cd... to return to the server directory)

scp -r hadoop-3.3.4 node2:`pwd`/
scp -r hadoop-3.3.4 node3:`pwd`/

Execute on node2, configure soft links for hadoop
ln -s /export/server/hadoop-3.3.4 /export/server/hadoop
execute on node3, configure soft links for hadoop
ln -s /export/server/hadoop-3.3.4 /export/server/hadoop

Configure environment variables

vim /etc/profile
appends the following content at the bottom of the /etc/profile file:

export HADOOP_HOME=/export/server/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Note: PATH is appended and will not conflict with the previous one

Environment variables take effect source /etc/profile

Authorize as hadoop user

The preparations for hadoop deployment are basically completed
. In order to ensure security, the hadoop system is not started as the root user. We start the entire Hadoop service with the common user hadoop. Therefore, we need to authorize the file permissions now.


As root, execute the following command su - root
cd /data/ on the three servers of node1, node2, and node3

chown -R hadoop:hadoop /data
chown -R hadoop:hadoop /export

l View has been authorized to hadoop
insert image description here

format file system

The preliminary preparations are all completed, and now initialize the entire file system
1. Format the namenode and make sure to execute su - hadoop format namenode hadoop namenode -format as
the hadoop user If stop-dfs.sh encounters an error that the command is not found, it indicates that the environment variable is not configured properly, and you can execute export/server/hadoop/sbin/start-dfs.sh export/server/hadoop/sbin/stop-dfs with an absolute path .sh










jps
view running java program ·

insert image description here

Error troubleshooting method!!

Understand the execution of start-dfs.ah:
start the SecondaryNameNode on the current machine, and start the NameNode according to the records of core-site.xml.
According to the records of the workers file, start the datanodes of each machine

Executing the script does not report an error, but the process does not exist:
check the log:

cd /export/server/hadoop/logs
ll--查看有哪些log可以排查
tail -100 hadoop-hadoop-namenode-node3.log--这里是你要检查的日志

清理:
rm -rf /export/server/hadoop/logs/*
rm -rf /data/nn/;rm -rf /data/dn/

In case of permission issues:
chown -R hadoop:hadoop /data
chown -R hadoop:hadoop /export

Go back to the previous level cd...

Guess you like

Origin blog.csdn.net/weixin_43629813/article/details/130253438