HDFS Common Command Usage Tutorial and Architecture Introduction

HDFS is a distributed file system implemented by hadoop. (Hadoop Distributed File System) comes from Google's GFS paper. Its design goals are:

  • Very large distributed file system.
  • It runs on ordinary cheap hardware, and ordinary PCs (compared to minicomputers and single-chip microcomputers).
  • Easy to expand, providing users with file storage services with good performance.

Architecture of HDFS

Schematic diagram of HDFS architectureSchematic diagram of HDFS architecture

1.HDFS adopts the architecture of 1 Msater (NameNode) and N slaves (DataNode)

An HDFS cluster includes a NameNode whose main responsibility is to manage metadata information of the file system and control client access to files.

An HDFS cluster contains multiple DataNodes, usually a node is a DataNode, which is responsible for the storage of files on the corresponding node.

The role of NameNode

  • Responsible for responding to client requests
  • Maintain the directory tree of the entire file system (such as adding, deleting, modifying, and checking operations for recording files) and manage metadata (file names, copy coefficients, mapping between files and blocks, mapping between DataNodes and blocks, etc.)

The role of DataNodes

  • Store the data block corresponding to the file, and storing data is the core function
  • Regularly send heartbeat information to NameNode to report itself and all block information and health status
  • Execute instructions from NameNode, such as block creation, deletion, replication, support for file read and write requests, etc.

A typical deployment architecture is 1 NameNode + N DataNodes.

2. Namespace and multi-copy mechanism of HDFS file system

Similar to the existing Linux file system, the HDFS file system also has a hierarchical structure of directories. It is also possible to create, delete, move and rename files and directories.

A file will be split into multiple blocks (default blocksize: 128M). These Blocks will be copied multiple times and stored on multiple DataNodes. Except for the last block, other blocks are the same size, and the application can specify the copy factor of the file. The copy factor can be specified at the beginning of file creation, or it can be modified later. The figure below is an example illustration of multi-replica storage.

HDFS multi-copy storage mechanismHDFS multi-copy storage mechanism

Taking part-1 as an example, it is divided into three blocks, the block_ids are 2, 4, and 5 respectively, and the copy factor is 3. It can be seen that on the DataNode, 2, 4, and 5 are all stored on three nodes, so that when one of the nodes fails, the availability of the files can still be guaranteed.

The necessity of block_id is that when the user needs to operate on the file, the corresponding blocks can be "combined" in order.

Operation of HDFS Shell

Let's use commands to operate HDFS. The meaning of common commands is similar to that of Linux shell. The format ishadoop fs -[linux shell]

 命令行操作:有两种类型:
 
	(1)普通操作命令: hdfs dfs ******
		命令
		-mkdir:在HDFS上创建目录
		        hdfs dfs -mkdir /aaa
				hdfs dfs -mkdir /bbb/ccc
				如果父目录不存在,使用-p参数先创建父目录
				
		-ls      查看HDFS的某个目录
		-ls -R   查看HDFS的某个目录,包含子目录
		         简写: -lsr
					
		-put            上传数据
		-copyFromLocal  上传数据
		-moveFromLocal  上传数据,相当于ctrl+x
		
		-copyToLocal   下载数据
		-get            下载数据
		     举例: hdfs dfs -get /input/data.txt .
		
		-rm: 删除目录
		-rmr: 删除目录,包括子目录
		       hdfs dfs -rmr /bbb
			   日志:
			   17/12/08 20:32:10 INFO fs.TrashPolicyDefault: Namenode trash 	  configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
                  Deleted /bbb
					
		-getmerge:把某个目录下的文件,合并后再下载
		
		-cp:拷贝   hdfs dfs -cp /input/data.txt /input/data2.txt
		-mv:移动   hdfs dfs -cp /input/data.txt /aaa/a.txt
				
		-count: 举例:hdfs dfs -count /students
		
		-du: 类似-count,信息更详细
		     hdfs dfs -du /students
		
		例子:
			[root@bigdata11 ~]# hdfs dfs -count /students
					   1            2                 29 /students
			[root@bigdata11 ~]# hdfs dfs -ls /students
			Found 2 items
			-rw-r--r--   1 root supergroup         19 2017-12-08 20:35 /students/student01.txt
			-rw-r--r--   1 root supergroup         10 2017-12-08 20:35 /students/student02.txt
			[root@bigdata11 ~]# hdfs dfs -du /students
			19  /students/student01.txt
			10  /students/student02.txt			
		
		-text、-cat: 查看文本的内容
		          hdfs dfs -cat /input/data.txt
		balancer:平衡操作
                                                      hdfs 	balancer		

  (2)管理命令:hdfs dfsadmin ******
        -report: 打印HDFS的报告
		         举例:hdfs dfsadmin -report
				 
		-safemode:安全模式
			hdfs dfsadmin -safemode
			Usage: hdfs dfsadmin [-safemode enter | leave | get | wait]	
			
			[root@bigdata11 ~]# hdfs dfsadmin -safemode get
			Safe mode is OFF
			[root@bigdata11 ~]# hdfs dfsadmin -safemode enter
			Safe mode is ON
			[root@bigdata11 ~]# hdfs dfs -mkdir /dddd
			mkdir: Cannot create directory /dddd. Name node is in safe mode.
			[root@bigdata11 ~]# hdfs dfsadmin -safemode leave
			Safe mode is OFF	

1.ls

usage:

hadoop fs -ls directory, means to view the file information under the directory directory on HDFS

Actual combat:

hadoop fs -ls / #表示查看HDFS上根目录下的文件信息,如果没有上传过文件到HDFS,则不会输出任何文件信息

HDFS ls commandHDFS ls command

2.put

usage:

hadoop fs -put file directory, means to upload the local file to the HDFS directory

Actual combat:

#将本次的hello.txt上传到hdfs中根目录下并查看是否上传成功hadoop fs -put hello.txt /#hello.txt是在当前本地目录下的一个已经创建好的文件hadoop fs -ls / 

HDFS put commandHDFS put command

3.cat

usage:

hadoop fs -cat filepath/file, means to view the content of the file file under the filepath on hdfs. hadoop fs -text filepath/filecan also achieve the same effect.

Actual combat:

hadoop fs -cat /hello.txt#查看HDFS上根目录下hello.txt的文件内容

HDFS cat commandHDFS cat command

4.mkdir

usage:

hadoop fs -mkdir directory, indicating to create a directory directory on hdfs.

hadoop fs -mkdir -p directory1/directory2, means to recursively create the directory1/directory2 directory on hdfs.

Actual combat:

hadoop fs -mkdir /test #表示在hdfs上根目录下创建test目录hadoop fs -ls /

HDFS mkdir commandHDFS mkdir command

hadoop fs -ls /testhadoop fs -mkdir -p /test/a/b #在hdfs上test目录下递归创建a目录/b目录。-p表示递归创建hadoop fs -ls /testhadoop fs -ls /test/a

HDFS mkdir -p commandHDFS mkdir -p command

hadoop fs -ls -R /#递归展示hdfs根目录下的内容

HDFS ls -R commandHDFS ls -R command

5.copyFromLocal

usage:

hadoop fs -copyFromLocal file1 filepath/file2Indicates copying the local file to the filepath of HDFS and naming it file2

Actual combat:

hadoop fs -copyFromLocal hello.txt /test/a/b/h.txt#将本地hello.txt copy为HDFS上目录/test/a/b/下的h.txt文件hadoop fs -ls -R

HDFS copyFromLocal commandHDFS copyFromLocal command

hadoop fs -text /test/a/b/h.txt#也可以用-cat代替

HDFS text commandHDFS text command

6.get

usage:

hadoop fs -get filepath/file1Indicates that file1 under filepath on HDFS is copied to the local and named as file2

Actual combat:

hadoop fs -get /test/a/b/h.txt

HDFS get commandHDFS get command

7.rm

usage:

hadoop fs -rm filepath/fileIndicates to delete the file file under filepath on HDFS

hadoop fs -rm -R directoryIndicates to delete the directory directory on HDFS

Actual combat:

hadoop fs -rm /hello.txt#删除hdfs根目录下的hello.txthadoop fs -ls -R /hadoop fs -rm -R /test#删除hdfs根目录下的test目录hadoop fs -ls -R /

The rm command of HDFSThe rm command of HDFS

Tips: Use hadoop fs 回车to view the help information of the command, and you can query it at any time if you don’t know it.

Visually view HDFS files through a browser

In the section on setting up the environment, we verified that HDFS was built successfully through the browser. In fact, we can also see the specific files in HDFS in the browser. In the previous part, we deleted all the content on HDFS, so we first created several directories and files on HDFS.

hadoop fs -put hello.txt /hadoop fs -mkdir /dir1hadoop fs -put ../software/hadoop-2.6.0-cdh5.7.0.tar.gz /hadoop fs -ls /

Use the command to view HDFS filesUse the command to view HDFS files

View HDFS files through a browserView HDFS files through a browser

Visit http://localhost:50070 to verify whether HDFS has started successfully. Click Browse the file systemto see the two files and a folder we uploaded.

Click the file name to see the Block information. You can see that the block size is 128M, so a 297M file is divided into 3 blocks.

Summarize

In this article, we have learned the architecture of HDFS and the command operation of HDFS.

In terms of architecture, HDFS adopts the method of 1 NameNode + N DataNodes, each of which performs its own duties, and jointly realizes a distributed file system, which has the advantage of easy expansion. By splitting the file into multiple blocks and storing multiple copies, it ensures load balance and reliability, facilitates parallel processing, and improves computing efficiency.

In terms of HDFS operation, hadoop fs followed by commonly used linux commands can realize the operation of HDFS, which is easy to understand. If you encounter difficulties, you can directly press hadoop fs and press Enter to view the help information.

reference documents

https://blog.csdn.net/nihaoa50/article/details/88419432

https://www.cnblogs.com/mayundalao/p/11799787.html

Guess you like

Origin blog.csdn.net/lomodays207/article/details/107255314