Table of contents
- Environment deployment
-
- hadoop-3.3.4.tar.gz
- build soft link
- Configure the workers folder
- Configure hadoop-env.sh file
- Configure the core-site.xml file
- Configure the hdfs-site.xml file
- Prepare the data directory
- Distribute the Hadoop folder
- Configure some scripts and programs of Hadoop into PATH
- Authorize as hadoop user
- Format the entire file system
- View HDFS WEBUI
- save snapshot
https://www.bilibili.com/video/BV1WY4y197g7?p=22
Environment deployment
hadoop-3.3.4.tar.gz
The roles of Hadoop HDFS include:
NameNode, master node manager
DataNode, slave node worker
SecondaryNameNode, master node assistant
node | CPU | Memory | Serve |
---|---|---|---|
node1 | 1 core | 4GB | NameNode、DataNode、SecondaryNameNode |
node2 | 1 core | 2GB | DataNode |
node3 | 1 core | 2GB | DataNode |
node1 node execution, login as root
Upload Hadoop installation package
通过finalshell软件直接拖拽上去
Unzip the installation package to /export/server/
tar -zxvf hadoop-3.3.4.tar.gz -C /export/server
build soft link
cd /export/server
ln -s /export/server/hadoop-3.3.4 hadoop
Enter the hadoop installation package
cd hadoop
Folder meaning:
- bin, store various programs (commands) of Hadoop
- etc, store Hadoop configuration files
- include, some header files of C language
- lib, stores the dynamic link library (.so file) of the Linux system
- libexec, which stores script files (.sh and .cmd) for configuring the Hadoop system
- license-binary, store the license file
- sbin, the administrator program (super bin)
- share, store binary source code (Java jar package)
Configure the workers folder
The workers file is to record which servers in the entire cluster are our slave nodes.
Enter the configuration file directory
cd etc/hadoop
Edit the workers file
vim workers
Remove localhost and add the following
node1
node2
node3
Configure hadoop-env.sh file
vim hadoop-env.sh
Add the following
export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
JAVA_HOME, JDK environment location
HADOOP_HOME, Hadoop installation location
HADOOP_CONF_DIR, Hadoop configuration file directory location
HADOOP_LOG_DIR, Hadoop operation log directory location
Configure the core-site.xml file
vim core-site.xml
Add the content in the configuration
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
- key:fs.defaultFS
- Meaning: The network communication path of the HDFS file system
- Value: hdfs://node1:8020
- The protocol is hdfs://
- namenode for node1
- The namenode communication port is 8020
- key:io.file.buffer.size
- Meaning: io operation file buffer size
- Value: 131072bit
- hdfs://node1:8020 is the internal communication address of the entire HDFS, and the application protocol is hdfs:// (Hadoop built-in protocol)
- Indicates that DataNode will communicate with port 8020 of node1, and node1 is the machine where NameNode is located
- This configuration fixes that node1 must start the NameNode process
Configure the hdfs-site.xml file
vim hdfs-site.xml
add content
<configuration>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>700</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/nn</value>
</property>
<property>
<name>dfs.namenode.hosts</name>
<value>node1,node2,node3</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/dn</value>
</property>
</configuration>
- key:dfs.datanode.data.dir.perm
- Meaning: hdfs file system, file permission settings created by default
- Value: 700, namely: rwx------
- key:dfs.namenode.name.dir
- Meaning: storage location of NameNode metadata
- Value: /data/nn, in the /data/nn directory of the node1 node
- key:dfs.namenode.hosts
- Meaning: which nodes the NameNode allows to connect to the DataNode (that is, allow to join the cluster)
- Values: node1, node2, node3
- key:dfs.blocksize
- Meaning: hdfs default block size
- Value: 268435456 (256MB)
- key:dfs.namenode.handler.count
- Meaning: the number of concurrent threads processed by namenode
- Value: 100, processing file system management tasks with 100 concurrency
- key:dfs.datanode.data.dir
- Meaning: the data storage directory of the slave node DataNode
- Value: /data/dn, that is, the data is stored in node1, node2, node3, and /data/dn of the three machines
Prepare the data directory
In node1 node:
mkdir -p /data/nn
mkdir -p /data/dn
On node2 and node3 nodes:
mkdir -p /data/dn
Distribute the Hadoop folder
Remotely copy the hadoop installation folder from node1 to node2, node3
Execute on node1
cd /export/server
scp -r hadoop-3.3.4 node2:`pwd`/
scp -r hadoop-3.3.4 node3:`pwd`/
Execute on node2 and node3 to
build soft links
cd /export/server
ln -s /export/server/hadoop-3.3.4 hadoop
ll
Configure some scripts and programs of Hadoop into PATH
Operate in node1, node2, node3
vim /etc/profile
Add the following at the bottom
export HADOOP_HOME=/export/server/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
take effect
source /etc/profile
Authorize as hadoop user
In order to ensure security, the hadoop system is not started as the root user, but we start the entire Hadoop service as the common user hadoop.
. . . Hadoop users have been created in the previous chapters, and password-free login between hadoop users has been configured. . .
Execute the following commands on the three servers as root
chown -R hadoop:hadoop /data
chown -R hadoop:hadoop /export
Format the entire file system
Execute on node1
switch to hadoop user
su - hadoop
format namenode
hadoop namenode -format
verify
cd /data/
ll -h
cd nn
ll
cd current/
ll
The above figure shows that the formatting is successful
Start
One-click start hdfs cluster
start-dfs.sh
View the java process currently running on the system
jps
View HDFS WEBUI
After the startup is complete, you can open http://node1:9870 in your browser
to view the management page of the hdfs file system.
When the entire Hadoop HDFS is running, it will provide us with a management platform page. This website is port 9870 of the server where the namenode is located.
save snapshot
One key off
stop-dfs.sh
logout hadoop user
exit
Three servers perform shutdown
init 0