HDFS is a distributed file system implemented by hadoop. (Hadoop Distributed File System) comes from Google's GFS paper. Its design goals are:
- Very large distributed file system.
- It runs on ordinary cheap hardware, and ordinary PCs (compared to minicomputers and single-chip microcomputers).
- Easy to expand, providing users with file storage services with good performance.
Architecture of HDFS
Schematic diagram of HDFS architecture
1.HDFS adopts the architecture of 1 Msater (NameNode) and N slaves (DataNode)
An HDFS cluster includes a NameNode whose main responsibility is to manage metadata information of the file system and control client access to files.
An HDFS cluster contains multiple DataNodes, usually a node is a DataNode, which is responsible for the storage of files on the corresponding node.
The role of NameNode
- Responsible for responding to client requests
- Maintain the directory tree of the entire file system (such as adding, deleting, modifying, and checking operations for recording files) and manage metadata (file names, copy coefficients, mapping between files and blocks, mapping between DataNodes and blocks, etc.)
The role of DataNodes
- Store the data block corresponding to the file, and storing data is the core function
- Regularly send heartbeat information to NameNode to report itself and all block information and health status
- Execute instructions from NameNode, such as block creation, deletion, replication, support for file read and write requests, etc.
A typical deployment architecture is 1 NameNode + N DataNodes.
2. Namespace and multi-copy mechanism of HDFS file system
Similar to the existing Linux file system, the HDFS file system also has a hierarchical structure of directories. It is also possible to create, delete, move and rename files and directories.
A file will be split into multiple blocks (default blocksize: 128M). These Blocks will be copied multiple times and stored on multiple DataNodes. Except for the last block, other blocks are the same size, and the application can specify the copy factor of the file. The copy factor can be specified at the beginning of file creation, or it can be modified later. The figure below is an example illustration of multi-replica storage.
HDFS multi-copy storage mechanism
Taking part-1 as an example, it is divided into three blocks, the block_ids are 2, 4, and 5 respectively, and the copy factor is 3. It can be seen that on the DataNode, 2, 4, and 5 are all stored on three nodes, so that when one of the nodes fails, the availability of the files can still be guaranteed.
The necessity of block_id is that when the user needs to operate on the file, the corresponding blocks can be "combined" in order.
Operation of HDFS Shell
Let's use commands to operate HDFS. The meaning of common commands is similar to that of Linux shell. The format ishadoop fs -[linux shell]
命令行操作:有两种类型:
(1)普通操作命令: hdfs dfs ******
命令
-mkdir:在HDFS上创建目录
hdfs dfs -mkdir /aaa
hdfs dfs -mkdir /bbb/ccc
如果父目录不存在,使用-p参数先创建父目录
-ls 查看HDFS的某个目录
-ls -R 查看HDFS的某个目录,包含子目录
简写: -lsr
-put 上传数据
-copyFromLocal 上传数据
-moveFromLocal 上传数据,相当于ctrl+x
-copyToLocal 下载数据
-get 下载数据
举例: hdfs dfs -get /input/data.txt .
-rm: 删除目录
-rmr: 删除目录,包括子目录
hdfs dfs -rmr /bbb
日志:
17/12/08 20:32:10 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /bbb
-getmerge:把某个目录下的文件,合并后再下载
-cp:拷贝 hdfs dfs -cp /input/data.txt /input/data2.txt
-mv:移动 hdfs dfs -cp /input/data.txt /aaa/a.txt
-count: 举例:hdfs dfs -count /students
-du: 类似-count,信息更详细
hdfs dfs -du /students
例子:
[root@bigdata11 ~]# hdfs dfs -count /students
1 2 29 /students
[root@bigdata11 ~]# hdfs dfs -ls /students
Found 2 items
-rw-r--r-- 1 root supergroup 19 2017-12-08 20:35 /students/student01.txt
-rw-r--r-- 1 root supergroup 10 2017-12-08 20:35 /students/student02.txt
[root@bigdata11 ~]# hdfs dfs -du /students
19 /students/student01.txt
10 /students/student02.txt
-text、-cat: 查看文本的内容
hdfs dfs -cat /input/data.txt
balancer:平衡操作
hdfs balancer
(2)管理命令:hdfs dfsadmin ******
-report: 打印HDFS的报告
举例:hdfs dfsadmin -report
-safemode:安全模式
hdfs dfsadmin -safemode
Usage: hdfs dfsadmin [-safemode enter | leave | get | wait]
[root@bigdata11 ~]# hdfs dfsadmin -safemode get
Safe mode is OFF
[root@bigdata11 ~]# hdfs dfsadmin -safemode enter
Safe mode is ON
[root@bigdata11 ~]# hdfs dfs -mkdir /dddd
mkdir: Cannot create directory /dddd. Name node is in safe mode.
[root@bigdata11 ~]# hdfs dfsadmin -safemode leave
Safe mode is OFF
1.ls
usage:
hadoop fs -ls directory
, means to view the file information under the directory directory on HDFS
Actual combat:
hadoop fs -ls / #表示查看HDFS上根目录下的文件信息,如果没有上传过文件到HDFS,则不会输出任何文件信息
HDFS ls command
2.put
usage:
hadoop fs -put file directory
, means to upload the local file to the HDFS directory
Actual combat:
#将本次的hello.txt上传到hdfs中根目录下并查看是否上传成功hadoop fs -put hello.txt /#hello.txt是在当前本地目录下的一个已经创建好的文件hadoop fs -ls /
HDFS put command
3.cat
usage:
hadoop fs -cat filepath/file
, means to view the content of the file file under the filepath on hdfs. hadoop fs -text filepath/file
can also achieve the same effect.
Actual combat:
hadoop fs -cat /hello.txt#查看HDFS上根目录下hello.txt的文件内容
HDFS cat command
4.mkdir
usage:
hadoop fs -mkdir directory
, indicating to create a directory directory on hdfs.
hadoop fs -mkdir -p directory1/directory2
, means to recursively create the directory1/directory2 directory on hdfs.
Actual combat:
hadoop fs -mkdir /test #表示在hdfs上根目录下创建test目录hadoop fs -ls /
HDFS mkdir command
hadoop fs -ls /testhadoop fs -mkdir -p /test/a/b #在hdfs上test目录下递归创建a目录/b目录。-p表示递归创建hadoop fs -ls /testhadoop fs -ls /test/a
HDFS mkdir -p command
hadoop fs -ls -R /#递归展示hdfs根目录下的内容
HDFS ls -R command
5.copyFromLocal
usage:
hadoop fs -copyFromLocal file1 filepath/file2
Indicates copying the local file to the filepath of HDFS and naming it file2
Actual combat:
hadoop fs -copyFromLocal hello.txt /test/a/b/h.txt#将本地hello.txt copy为HDFS上目录/test/a/b/下的h.txt文件hadoop fs -ls -R
HDFS copyFromLocal command
hadoop fs -text /test/a/b/h.txt#也可以用-cat代替
HDFS text command
6.get
usage:
hadoop fs -get filepath/file1
Indicates that file1 under filepath on HDFS is copied to the local and named as file2
Actual combat:
hadoop fs -get /test/a/b/h.txt
HDFS get command
7.rm
usage:
hadoop fs -rm filepath/file
Indicates to delete the file file under filepath on HDFS
hadoop fs -rm -R directory
Indicates to delete the directory directory on HDFS
Actual combat:
hadoop fs -rm /hello.txt#删除hdfs根目录下的hello.txthadoop fs -ls -R /hadoop fs -rm -R /test#删除hdfs根目录下的test目录hadoop fs -ls -R /
The rm command of HDFS
Tips: Use hadoop fs 回车
to view the help information of the command, and you can query it at any time if you don’t know it.
Visually view HDFS files through a browser
In the section on setting up the environment, we verified that HDFS was built successfully through the browser. In fact, we can also see the specific files in HDFS in the browser. In the previous part, we deleted all the content on HDFS, so we first created several directories and files on HDFS.
hadoop fs -put hello.txt /hadoop fs -mkdir /dir1hadoop fs -put ../software/hadoop-2.6.0-cdh5.7.0.tar.gz /hadoop fs -ls /
Use the command to view HDFS files
View HDFS files through a browser
Visit http://localhost:50070 to verify whether HDFS has started successfully. Click Browse the file system
to see the two files and a folder we uploaded.
Click the file name to see the Block information. You can see that the block size is 128M, so a 297M file is divided into 3 blocks.
Summarize
In this article, we have learned the architecture of HDFS and the command operation of HDFS.
In terms of architecture, HDFS adopts the method of 1 NameNode + N DataNodes, each of which performs its own duties, and jointly realizes a distributed file system, which has the advantage of easy expansion. By splitting the file into multiple blocks and storing multiple copies, it ensures load balance and reliability, facilitates parallel processing, and improves computing efficiency.
In terms of HDFS operation, hadoop fs followed by commonly used linux commands can realize the operation of HDFS, which is easy to understand. If you encounter difficulties, you can directly press hadoop fs and press Enter to view the help information.
reference documents
https://blog.csdn.net/nihaoa50/article/details/88419432
https://www.cnblogs.com/mayundalao/p/11799787.html