Dark Horse Big Data Study Notes 2-HDFS Environment Deployment

https://www.bilibili.com/video/BV1WY4y197g7?p=22

Environment deployment

hadoop-3.3.4.tar.gz

The roles of Hadoop HDFS include:
NameNode, master node manager
DataNode, slave node worker
SecondaryNameNode, master node assistant

node CPU Memory Serve
node1 1 core 4GB NameNode、DataNode、SecondaryNameNode
node2 1 core 2GB DataNode
node3 1 core 2GB DataNode

node1 node execution, login as root
Upload Hadoop installation package

通过finalshell软件直接拖拽上去

Unzip the installation package to /export/server/

tar -zxvf hadoop-3.3.4.tar.gz -C /export/server

build soft link

cd /export/server
ln -s /export/server/hadoop-3.3.4 hadoop

Enter the hadoop installation package

cd hadoop

insert image description here
Folder meaning:

  • bin, store various programs (commands) of Hadoop
  • etc, store Hadoop configuration files
  • include, some header files of C language
  • lib, stores the dynamic link library (.so file) of the Linux system
  • libexec, which stores script files (.sh and .cmd) for configuring the Hadoop system
  • license-binary, store the license file
  • sbin, the administrator program (super bin)
  • share, store binary source code (Java jar package)

Configure the workers folder

The workers file is to record which servers in the entire cluster are our slave nodes.
Enter the configuration file directory

cd etc/hadoop

Edit the workers file

vim workers

Remove localhost and add the following

node1
node2
node3

Configure hadoop-env.sh file

vim hadoop-env.sh

Add the following

export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LOG_DIR=$HADOOP_HOME/logs

JAVA_HOME, JDK environment location
HADOOP_HOME, Hadoop installation location
HADOOP_CONF_DIR, Hadoop configuration file directory location
HADOOP_LOG_DIR, Hadoop operation log directory location

Configure the core-site.xml file

vim core-site.xml

Add the content in the configuration

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://node1:8020</value>
	</property>
	
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
	</property>
</configuration>
  • key:fs.defaultFS
  • Meaning: The network communication path of the HDFS file system
  • Value: hdfs://node1:8020
    • The protocol is hdfs://
    • namenode for node1
    • The namenode communication port is 8020

  • key:io.file.buffer.size
  • Meaning: io operation file buffer size
  • Value: 131072bit

  • hdfs://node1:8020 is the internal communication address of the entire HDFS, and the application protocol is hdfs:// (Hadoop built-in protocol)
  • Indicates that DataNode will communicate with port 8020 of node1, and node1 is the machine where NameNode is located
  • This configuration fixes that node1 must start the NameNode process

Configure the hdfs-site.xml file

vim hdfs-site.xml

add content

<configuration>
	<property>
		<name>dfs.datanode.data.dir.perm</name>
		<value>700</value>
	</property>
	
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/data/nn</value>
	</property>
	
	<property>
		<name>dfs.namenode.hosts</name>
		<value>node1,node2,node3</value>
	</property>
		
	<property>
		<name>dfs.blocksize</name>
		<value>268435456</value>
	</property>
		
	<property>
		<name>dfs.namenode.handler.count</name>
		<value>100</value>
	</property>
		
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/data/dn</value>
	</property>	
</configuration>
  • key:dfs.datanode.data.dir.perm
  • Meaning: hdfs file system, file permission settings created by default
  • Value: 700, namely: rwx------

  • key:dfs.namenode.name.dir
  • Meaning: storage location of NameNode metadata
  • Value: /data/nn, in the /data/nn directory of the node1 node

  • key:dfs.namenode.hosts
  • Meaning: which nodes the NameNode allows to connect to the DataNode (that is, allow to join the cluster)
  • Values: node1, node2, node3

  • key:dfs.blocksize
  • Meaning: hdfs default block size
  • Value: 268435456 (256MB)

  • key:dfs.namenode.handler.count
  • Meaning: the number of concurrent threads processed by namenode
  • Value: 100, processing file system management tasks with 100 concurrency

  • key:dfs.datanode.data.dir
  • Meaning: the data storage directory of the slave node DataNode
  • Value: /data/dn, that is, the data is stored in node1, node2, node3, and /data/dn of the three machines

Prepare the data directory

In node1 node:

mkdir -p /data/nn
mkdir -p /data/dn

On node2 and node3 nodes:

mkdir -p /data/dn

Distribute the Hadoop folder

Remotely copy the hadoop installation folder from node1 to node2, node3

Execute on node1

cd /export/server
scp -r hadoop-3.3.4 node2:`pwd`/
scp -r hadoop-3.3.4 node3:`pwd`/

Execute on node2 and node3 to
build soft links

cd /export/server
ln -s /export/server/hadoop-3.3.4 hadoop
ll

Configure some scripts and programs of Hadoop into PATH

Operate in node1, node2, node3

vim /etc/profile

Add the following at the bottom

export HADOOP_HOME=/export/server/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

take effect

source /etc/profile

Authorize as hadoop user

In order to ensure security, the hadoop system is not started as the root user, but we start the entire Hadoop service as the common user hadoop.
. . . Hadoop users have been created in the previous chapters, and password-free login between hadoop users has been configured. . .
Execute the following commands on the three servers as root

chown -R hadoop:hadoop /data
chown -R hadoop:hadoop /export

insert image description here

Format the entire file system

Execute on node1
switch to hadoop user

su - hadoop

format namenode

hadoop namenode -format

verify

cd /data/
ll -h
cd nn
ll
cd current/
ll

insert image description here

The above figure shows that the formatting is successful
Start
One-click start hdfs cluster

start-dfs.sh

View the java process currently running on the system

jps

View HDFS WEBUI

After the startup is complete, you can open http://node1:9870 in your browser
to view the management page of the hdfs file system.
When the entire Hadoop HDFS is running, it will provide us with a management platform page. This website is port 9870 of the server where the namenode is located.
insert image description here

save snapshot

One key off

stop-dfs.sh

logout hadoop user

exit

Three servers perform shutdown

init 0

insert image description here

Guess you like

Origin blog.csdn.net/weixin_45735391/article/details/131627832