Article directory
- 1. Background
- 2. What are the hdfs file system shell commands?
- 3. Determine which file system the shell operates on
- 4. Prepare the following files locally
- 5、hdfs file system shell
-
- 5.1 mkdir to create a directory
- 5.2 put to upload files
- 5.3 ls to view directories or files
- 5.4 cat view file content
- 5.5 head View the first 1000 bytes of the file
- 5.6 tail View the content of the last 1000 bytes of the file
- 5.7 appendToFile append data to hdfs file
- 5.8 get download file
- 5.9 getmerge combined download
- 5.10 cp copy files
- 5.11 mv moving files
- 5.12 setrep modify the number of copies of the specified file
- 5.13 df shows free space
- 5.14 du statistics folder or file size
- 5.15 chgrp chmod chown to change the permissions of the file
- 5.16 rm delete file or directory
- 6. Interface operation
- 7. Reference link
1. Background
Here we use the command line to briefly learn hdfs file system shell
some operations of .
2. What are the hdfs file system shell commands?
We can see the supported command operations through the following URL https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile . Most of these commands linux
are similar to the command usage of .
3. Determine which file system the shell operates on
# 操作本地文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls file:///
Found 19 items
dr-xr-xr-x - root root 24576 2023-02-18 14:47 file:///bin
dr-xr-xr-x - root root 4096 2022-06-13 10:41 file:///boot
drwxr-xr-x - root root 3140 2023-02-28 20:17 file:///dev
......
# 操作hdfs 文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls hdfs://hadoop01:8020/
Found 1 items
drwxrwx--- - hadoopdeploy supergroup 0 2023-02-19 17:20 hdfs://hadoop01:8020/tmp
# 操作hdfs 文件系统 fs.defaultFS
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /
Found 1 items
drwxrwx--- - hadoopdeploy supergroup 0 2023-02-19 17:20 /tmp
[hadoopdeploy@hadoop01 ~]$
4. Prepare the following files locally
file name | content |
---|---|
1.txt | aaa |
2.txt | bbb |
3.txt | ccc |
5、hdfs file system shell
5.1 mkdir to create a directory
Syntax: Usage: hadoop fs -mkdir [-p] <paths>
-p
Indicates that if the parent directory does not exist, create the parent directory.
[hadoopdeploy@hadoop01 sbin]$ hadoop fs -mkdir -p /bigdata/hadoop
[hadoopdeploy@hadoop01 sbin]$
5.2 put to upload files
Syntax: Usage: hadoop fs -put [-f] [-p] [-d] [-t <thread count>] [-q <thread pool queue size>] [ - | <localsrc> ...] <dst>
-f
Overwrite if target file already exists Preserve
-p
access and modification times, ownership and permissions Temporary files to
-d
skip Number of threads to use, defaults to 1. Useful when uploading directories with more than 1 file The thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1._COPYING_
-t
-q
# 创建3个文件 1.txt 2.txt 3.txt
[hadoopdeploy@hadoop01 ~]$ echo aaa > 1.txt
[hadoopdeploy@hadoop01 ~]$ echo bbb > 2.txt
[hadoopdeploy@hadoop01 ~]$ echo ccc > 3.txt
# 上传本地的 1.txt 到hdfs的 /bigdata/hadoop 目录中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
# 因为 /bigdata/hadoop 中已经存在了 1.txt 所有上传失败
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
put: `/bigdata/hadoop/1.txt': File exists
# 通过 -f 参数,如果目标文件已经存在,则进行覆盖操作
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f 1.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 1 items
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
# 通过多线程和 通配符 上传多个文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f -t 3 *.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
5.3 ls to view directories or files
Syntax: Usage: hadoop fs -ls [-h] [-R] <paths>
-h
display human-readable, such as the size of the file, how many M are displayed, etc.
-R
Show recursively.
# 列出/bigdata 目录和文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/
Found 1 items
drwxr-xr-x - hadoopdeploy supergroup 0 2023-02-28 12:37 /bigdata/hadoop
# -R 递归展示
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R /bigdata/
drwxr-xr-x - hadoopdeploy supergroup 0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# -h 展示成人类可读的,比如多少k,多少M等
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R -h /bigdata/
drwxr-xr-x - hadoopdeploy supergroup 0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
5.4 cat view file content
Syntax: Usage: hadoop fs -cat [-ignoreCrc] URI [URI ...]
-ignoreCrc
Disable checkshum verification.
注意:
If the file is relatively large, you need to read it carefully, because this is to view the entire content of the file
# 查看 1.txt 和 2.txt 的文件内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat -ignoreCrc /bigdata/hadoop/1.txt /bigdata/hadoop/2.txt
aaa
bbb
[hadoopdeploy@hadoop01 ~]$
5.5 head View the first 1000 bytes of the file
Syntax: Usage: hadoop fs -head URI
Displays first kilobyte of the file to stdout ( displays the first kilobyte of the file )
# 查看1.txt的前1000字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -head /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$
5.6 tail View the content of the last 1000 bytes of the file
Syntax: Usage:hadoop fs -tail [-f] URI
Displays last kilobyte of the file to stdout. ( Displays the last kilobyte of the file to stdout )
-f
: Indicates that additional data will be output as the file grows, just like in Unix.
# 查看1.txt的后1000字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -tail /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$
5.7 appendToFile append data to hdfs file
Syntax: Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src or multiple src from local file system to target file system. Input can also be read from standard input ( localsrc是-
) and appended to the target file system.
# 查看1.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# 查看2.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
# 将1.txt文件的内容追加到2.txt文件中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -appendToFile 1.txt /bigdata/hadoop/2.txt
# 再次查看2.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
aaa
[hadoopdeploy@hadoop01 ~]$
5.8 get download file
grammar: Usage: hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>
Copy the file to the local file system. Files that fail the CRC check can be copied using the -gnrecrc option. Files and CRCs can be copied using the -crc option.
-f
Overwrite if target file already exists Preserve
-p
access and modification times, ownership and permissions
-t
Number of threads to use, defaults to 1. Useful when downloading a directory with multiple files
-q
The thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1
# 下载hdfs文件系统的1.txt 到本地当前目录下的1.txt.download文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
# 查看 1.txt.download是否存在
[hadoopdeploy@hadoop01 ~]$ ls
1.txt 1.txt.download 2.txt 3.txt
# 再次下载,因为本地已经存在1.txt.download文件,所有报错
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
get: `./1.txt.download': File exists
# 通过 -f 覆盖已经存在的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f /bigdata/hadoop/1.txt ./1.txt.download
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt ./123.txt.download
get: `./123.txt.download': No such file or directory
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt .
[hadoopdeploy@hadoop01 ~]$
5.9 getmerge combined download
grammar: Usage: hadoop fs -getmerge [-nl] [-skip-empty-file] <src> <localdst>
Merge the contents of multiple src files into the localdst file
-nl
Indicates adding a newline at the end of each file
-skip-empty-file
and skipping empty files
# hdfs上1.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# hdfs上3.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/3.txt
ccc
# 将hdfs上1.txt 3.txt下载到本地 merge.txt 文件中 -nl增加换行符 -skip-empty-file跳过空文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -getmerge -nl -skip-empty-file /bigdata/hadoop/1.txt /bigdata/hadoop/3.txt ./merge.txt
# 查看merge.txt文件
[hadoopdeploy@hadoop01 ~]$ cat merge.txt
aaa
ccc
[hadoopdeploy@hadoop01 ~]$
5.10 cp copy files
grammar: Usage: hadoop fs -cp [-f] [-p | -p[topax]] [-t <thread count>] [-q <thread pool queue size>] URI [URI ...] <dest>
-f
Overwrite if the target file exists.
-t
The number of threads to use, defaults to 1. Useful when copying directories containing multiple files
-q
The thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1
# 查看 /bigdata目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 1 items
drwxr-xr-x - hadoopdeploy supergroup 0 2023-02-28 12:55 /bigdata/hadoop
# 查看/bigdata/hadoop目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 /bigdata/hadoop 目录下所有的文件 复制到 /bigdata 目录下
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cp /bigdata/hadoop/* /bigdata
# 查看 /bigdata/ 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 4 items
-rw-r--r-- 2 hadoopdeploy supergroup 4 2023-02-28 13:17 /bigdata/1.txt
-rw-r--r-- 2 hadoopdeploy supergroup 8 2023-02-28 13:17 /bigdata/2.txt
-rw-r--r-- 2 hadoopdeploy supergroup 4 2023-02-28 13:17 /bigdata/3.txt
drwxr-xr-x - hadoopdeploy supergroup 0 2023-02-28 12:55 /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$
5.11 mv moving files
Syntax: Usage: hadoop fs -mv URI [URI ...] <dest>
Move files from source to destination. This command also allows multiple sources, in which case the target needs to be a directory. Moving files across filesystems is not allowed.
# 列出 /bigdata/hadoop 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 1.txt 重命名为 1-new-name.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -mv /bigdata/hadoop/1.txt /bigdata/hadoop/1-new-name.txt
# 列出 /bigdata/hadoop 目录下的文件,可以看到1.txt已经改名了
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/1-new-name.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 4 2023-02-28 12:31 /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$
5.12 setrep modify the number of copies of the specified file
Syntax: Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Change the number of copies of a file. If path is a directory, the command recursively changes the number of copies of all files under the directory tree rooted at path. EC files will be ignored when executing this command.
-R
The -R flag is for backward compatibility. It has no effect.
-w
The -w flag asks the command to wait for the copy to complete. This can take a long time.
# 修改1-new-name.txt文件为3个副本
[hadoopdeploy@hadoop01 ~]$ hadoop fs -setrep -w 3 /bigdata/hadoop/1-new-name.txt
Replication 3 set: /bigdata/hadoop/1-new-name.txt
Waiting for /bigdata/hadoop/1-new-name.txt .... done
[hadoopdeploy@hadoop01 ~]$
5.13 df shows free space
grammar: Usage: hadoop fs -df [-h] URI [URI ...]
[hadoopdeploy@hadoop01 ~]$ hadoop fs -df /bigdata/hadoop
Filesystem Size Used Available Use%
hdfs://hadoop01:8020 27697086464 1228800 17716019200 0%
# -h 显示人类可读的
[hadoopdeploy@hadoop01 ~]$ hadoop fs -df -h /bigdata/hadoop
Filesystem Size Used Available Use%
hdfs://hadoop01:8020 25.8 G 1.2 M 16.5 G 0%
5.14 du statistics folder or file size
grammar: Usage: hadoop fs -df [-h] URI [URI ...]
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du /bigdata/hadoop
4 12 /bigdata/hadoop/1-new-name.txt
8 16 /bigdata/hadoop/2.txt
4 8 /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s /bigdata/hadoop
16 36 /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h /bigdata/hadoop
16 36 /bigdata/hadoop
# 16 表示/bigdata/hadoop目录下所有文件的总大小
# 36 表示/bigdata/hadoop目录下所有文件占据所有副本的总大小
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h -v /bigdata/hadoop
SIZE DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS FULL_PATH_NAME
16 36 /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$
5.15 chgrp chmod chown to change the permissions of the file
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rw-rw-r-- 2 hadoopdeploy hadoopdeploy 8 2023-02-28 12:55 /bigdata/hadoop/2.txt
# 给2.txt增加可执行的权限
[hadoopdeploy@hadoop01 ~]$ hadoop fs -chmod +x /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rwxrwxr-x 2 hadoopdeploy hadoopdeploy 8 2023-02-28 12:55 /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$
5.16 rm delete file or directory
Syntax: Usage: hadoop fs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI ...]
If enabled 回收站
, the filesystem will move deleted files to the trash directory.
Currently, the Trash feature is disabled by default. The user can recycle bin by the value set for the parameter fs. trash.interval
( core-site.xml
in) .大于零
启用
-f
If the file does not exist, no diagnostic message will be displayed or the exit status modified to reflect the error.
-R
option recursively deletes the directory and anything under it.
-r
The option is equivalent to -R.
-skipTrash
option will bypass the Recycle Bin, if enabled, and delete the specified files immediately. This is useful when files need to be removed from large directories. A security confirmation is required before
-safely
deleting hadoop.shell.delete.limited.num.files
files whose total number of files is greater than (in , the default is 100)core-site.xml
# 删除2.txt,因为我本地启动了回收站,所以文件删除的文件进入了回收站
[hadoopdeploy@hadoop01 ~]$ hadoop fs -rm /bigdata/hadoop/2.txt
2023-02-28 22:04:51,302 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hadoop01:8020/bigdata/hadoop/2.txt' to trash at: hdfs://hadoop01:8020/user/hadoopdeploy/.Trash/Current/bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$
6. Interface operation
Some people may say, how to remember so many commands, if we can operate the hdfs interface, we can operate on the interface.