Simple use of hdfs file system shell

1. Background

Here we use the command line to briefly learn hdfs file system shell some operations of .

2. What are the hdfs file system shell commands?

We can see the supported command operations through the following URL https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile . Most of these commands linuxare similar to the command usage of .

What are the hdfs file system shells?

3. Determine which file system the shell operates on

How do we know which file system we are operating through hadoop fs manipulation

# 操作本地文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls file:///
Found 19 items
dr-xr-xr-x   - root root      24576 2023-02-18 14:47 file:///bin
dr-xr-xr-x   - root root       4096 2022-06-13 10:41 file:///boot
drwxr-xr-x   - root root       3140 2023-02-28 20:17 file:///dev
......
# 操作hdfs 文件系统
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls hdfs://hadoop01:8020/
Found 1 items
drwxrwx---   - hadoopdeploy supergroup          0 2023-02-19 17:20 hdfs://hadoop01:8020/tmp
# 操作hdfs 文件系统 fs.defaultFS
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /
Found 1 items
drwxrwx---   - hadoopdeploy supergroup          0 2023-02-19 17:20 /tmp
[hadoopdeploy@hadoop01 ~]$

4. Prepare the following files locally

file name content
1.txt aaa
2.txt bbb
3.txt ccc

5、hdfs file system shell

5.1 mkdir to create a directory

Syntax: Usage: hadoop fs -mkdir [-p] <paths>
-p Indicates that if the parent directory does not exist, create the parent directory.

[hadoopdeploy@hadoop01 sbin]$ hadoop fs -mkdir -p /bigdata/hadoop
[hadoopdeploy@hadoop01 sbin]$

5.2 put to upload files

Syntax: Usage: hadoop fs -put [-f] [-p] [-d] [-t <thread count>] [-q <thread pool queue size>] [ - | <localsrc> ...] <dst>
-f Overwrite if target file already exists Preserve
-paccess and modification times, ownership and permissions Temporary files to
-dskip Number of threads to use, defaults to 1. Useful when uploading directories with more than 1 file The thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1._COPYING_
-t
-q

# 创建3个文件 1.txt 2.txt 3.txt
[hadoopdeploy@hadoop01 ~]$ echo aaa > 1.txt
[hadoopdeploy@hadoop01 ~]$ echo bbb > 2.txt
[hadoopdeploy@hadoop01 ~]$ echo ccc > 3.txt
# 上传本地的 1.txt 到hdfs的 /bigdata/hadoop 目录中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
# 因为 /bigdata/hadoop 中已经存在了 1.txt 所有上传失败
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p 1.txt /bigdata/hadoop
put: `/bigdata/hadoop/1.txt': File exists
# 通过 -f 参数,如果目标文件已经存在,则进行覆盖操作
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f 1.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 1 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
# 通过多线程和 通配符 上传多个文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -put -p -f -t 3 *.txt /bigdata/hadoop
# 查看 /bigdata/hadoop 目录中的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt

5.3 ls to view directories or files

Syntax: Usage: hadoop fs -ls [-h] [-R] <paths>
-h display human-readable, such as the size of the file, how many M are displayed, etc.
-RShow recursively.

# 列出/bigdata 目录和文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/
Found 1 items
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
# -R 递归展示
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R /bigdata/
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# -h 展示成人类可读的,比如多少k,多少M等
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls -R -h /bigdata/
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:37 /bigdata/hadoop
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt

5.4 cat view file content

Syntax: Usage: hadoop fs -cat [-ignoreCrc] URI [URI ...]
-ignoreCrc Disable checkshum verification.
注意: If the file is relatively large, you need to read it carefully, because this is to view the entire content of the file

# 查看 1.txt 和 2.txt 的文件内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat -ignoreCrc /bigdata/hadoop/1.txt /bigdata/hadoop/2.txt
aaa
bbb
[hadoopdeploy@hadoop01 ~]$

5.5 head View the first 1000 bytes of the file

Syntax: Usage: hadoop fs -head URI
Displays first kilobyte of the file to stdout ( displays the first kilobyte of the file )

# 查看1.txt的前1000字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -head /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$

5.6 tail View the content of the last 1000 bytes of the file

Syntax: Usage:hadoop fs -tail [-f] URI
Displays last kilobyte of the file to stdout. ( Displays the last kilobyte of the file to stdout )
-f: Indicates that additional data will be output as the file grows, just like in Unix.

# 查看1.txt的后1000字节
[hadoopdeploy@hadoop01 ~]$ hadoop fs -tail /bigdata/hadoop/1.txt
aaa
[hadoopdeploy@hadoop01 ~]$

5.7 appendToFile append data to hdfs file

Syntax: Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Append single src or multiple src from local file system to target file system. Input can also be read from standard input ( localsrc是-) and appended to the target file system.

# 查看1.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# 查看2.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
# 将1.txt文件的内容追加到2.txt文件中
[hadoopdeploy@hadoop01 ~]$ hadoop fs -appendToFile 1.txt  /bigdata/hadoop/2.txt
# 再次查看2.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/2.txt
bbb
aaa
[hadoopdeploy@hadoop01 ~]$

5.8 get download file

grammar: Usage: hadoop fs -get [-ignorecrc] [-crc] [-p] [-f] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>

Copy the file to the local file system. Files that fail the CRC check can be copied using the -gnrecrc option. Files and CRCs can be copied using the -crc option.

-fOverwrite if target file already exists Preserve
-paccess and modification times, ownership and permissions
-tNumber of threads to use, defaults to 1. Useful when downloading a directory with multiple files
-qThe thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1

# 下载hdfs文件系统的1.txt 到本地当前目录下的1.txt.download文件 
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
# 查看 1.txt.download是否存在
[hadoopdeploy@hadoop01 ~]$ ls
1.txt  1.txt.download  2.txt  3.txt
# 再次下载,因为本地已经存在1.txt.download文件,所有报错
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get /bigdata/hadoop/1.txt ./1.txt.download
get: `./1.txt.download': File exists
# 通过 -f 覆盖已经存在的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f /bigdata/hadoop/1.txt ./1.txt.download
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt ./123.txt.download
get: `./123.txt.download': No such file or directory
# 多线程下载
[hadoopdeploy@hadoop01 ~]$ hadoop fs -get -f -t 3 /bigdata/hadoop/*.txt .
[hadoopdeploy@hadoop01 ~]$

5.9 getmerge combined download

grammar: Usage: hadoop fs -getmerge [-nl] [-skip-empty-file] <src> <localdst>

Merge the contents of multiple src files into the localdst file

-nlIndicates adding a newline at the end of each file
-skip-empty-fileand skipping empty files

# hdfs上1.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/1.txt
aaa
# hdfs上3.txt文件的内容
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cat /bigdata/hadoop/3.txt
ccc
# 将hdfs上1.txt 3.txt下载到本地 merge.txt 文件中 -nl增加换行符 -skip-empty-file跳过空文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -getmerge -nl -skip-empty-file /bigdata/hadoop/1.txt /bigdata/hadoop/3.txt ./merge.txt
# 查看merge.txt文件
[hadoopdeploy@hadoop01 ~]$ cat merge.txt
aaa

ccc

[hadoopdeploy@hadoop01 ~]$

5.10 cp copy files

grammar: Usage: hadoop fs -cp [-f] [-p | -p[topax]] [-t <thread count>] [-q <thread pool queue size>] URI [URI ...] <dest>

-fOverwrite if the target file exists.
-tThe number of threads to use, defaults to 1. Useful when copying directories containing multiple files
-qThe thread pool queue size to use, defaults to 1024. It only takes effect when the number of threads is greater than 1

# 查看 /bigdata目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 1 items
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:55 /bigdata/hadoop
# 查看/bigdata/hadoop目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 /bigdata/hadoop 目录下所有的文件 复制到 /bigdata 目录下
[hadoopdeploy@hadoop01 ~]$ hadoop fs -cp /bigdata/hadoop/* /bigdata
# 查看 /bigdata/ 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata
Found 4 items
-rw-r--r--   2 hadoopdeploy supergroup          4 2023-02-28 13:17 /bigdata/1.txt
-rw-r--r--   2 hadoopdeploy supergroup          8 2023-02-28 13:17 /bigdata/2.txt
-rw-r--r--   2 hadoopdeploy supergroup          4 2023-02-28 13:17 /bigdata/3.txt
drwxr-xr-x   - hadoopdeploy supergroup          0 2023-02-28 12:55 /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$

5.11 mv moving files

Syntax: Usage: hadoop fs -mv URI [URI ...] <dest>
Move files from source to destination. This command also allows multiple sources, in which case the target needs to be a directory. Moving files across filesystems is not allowed.

# 列出 /bigdata/hadoop 目录下的文件
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
# 将 1.txt 重命名为 1-new-name.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -mv /bigdata/hadoop/1.txt /bigdata/hadoop/1-new-name.txt
# 列出 /bigdata/hadoop 目录下的文件,可以看到1.txt已经改名了
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop
Found 3 items
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/1-new-name.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          4 2023-02-28 12:31 /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$

5.12 setrep modify the number of copies of the specified file

Syntax: Usage: hadoop fs -setrep [-R] [-w] <numReplicas> <path>
Change the number of copies of a file. If path is a directory, the command recursively changes the number of copies of all files under the directory tree rooted at path. EC files will be ignored when executing this command.
-RThe -R flag is for backward compatibility. It has no effect.
-wThe -w flag asks the command to wait for the copy to complete. This can take a long time.

# 修改1-new-name.txt文件为3个副本
[hadoopdeploy@hadoop01 ~]$ hadoop fs -setrep -w 3 /bigdata/hadoop/1-new-name.txt
Replication 3 set: /bigdata/hadoop/1-new-name.txt
Waiting for /bigdata/hadoop/1-new-name.txt .... done
[hadoopdeploy@hadoop01 ~]$

5.13 df shows free space

grammar: Usage: hadoop fs -df [-h] URI [URI ...]

[hadoopdeploy@hadoop01 ~]$ hadoop fs -df /bigdata/hadoop
Filesystem                   Size     Used    Available  Use%
hdfs://hadoop01:8020  27697086464  1228800  17716019200    0%
# -h 显示人类可读的
[hadoopdeploy@hadoop01 ~]$ hadoop fs -df -h /bigdata/hadoop
Filesystem              Size   Used  Available  Use%
hdfs://hadoop01:8020  25.8 G  1.2 M     16.5 G    0%

5.14 du statistics folder or file size

grammar: Usage: hadoop fs -df [-h] URI [URI ...]

[hadoopdeploy@hadoop01 ~]$ hadoop fs -du /bigdata/hadoop
4  12  /bigdata/hadoop/1-new-name.txt
8  16  /bigdata/hadoop/2.txt
4  8   /bigdata/hadoop/3.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s /bigdata/hadoop
16  36  /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h /bigdata/hadoop
16  36  /bigdata/hadoop
# 16 表示/bigdata/hadoop目录下所有文件的总大小
# 36 表示/bigdata/hadoop目录下所有文件占据所有副本的总大小
[hadoopdeploy@hadoop01 ~]$ hadoop fs -du -s -h -v /bigdata/hadoop
SIZE  DISK_SPACE_CONSUMED_WITH_ALL_REPLICAS  FULL_PATH_NAME
16    36                                     /bigdata/hadoop
[hadoopdeploy@hadoop01 ~]$

5.15 chgrp chmod chown to change the permissions of the file

[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rw-rw-r--   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
# 给2.txt增加可执行的权限
[hadoopdeploy@hadoop01 ~]$ hadoop fs -chmod +x /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$ hadoop fs -ls /bigdata/hadoop/2.txt
-rwxrwxr-x   2 hadoopdeploy hadoopdeploy          8 2023-02-28 12:55 /bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$

5.16 rm delete file or directory

Syntax: Usage: hadoop fs -rm [-f] [-r |-R] [-skipTrash] [-safely] URI [URI ...]
If enabled 回收站, the filesystem will move deleted files to the trash directory.
Currently, the Trash feature is disabled by default. The user can recycle bin by the value set for the parameter fs. trash.interval( core-site.xmlin) .大于零启用

-fIf the file does not exist, no diagnostic message will be displayed or the exit status modified to reflect the error.
-Roption recursively deletes the directory and anything under it.
-rThe option is equivalent to -R.
-skipTrashoption will bypass the Recycle Bin, if enabled, and delete the specified files immediately. This is useful when files need to be removed from large directories. A security confirmation is required before
-safelydeleting hadoop.shell.delete.limited.num.filesfiles whose total number of files is greater than (in , the default is 100)core-site.xml

# 删除2.txt,因为我本地启动了回收站,所以文件删除的文件进入了回收站
[hadoopdeploy@hadoop01 ~]$ hadoop fs -rm /bigdata/hadoop/2.txt
2023-02-28 22:04:51,302 INFO fs.TrashPolicyDefault: Moved: 'hdfs://hadoop01:8020/bigdata/hadoop/2.txt' to trash at: hdfs://hadoop01:8020/user/hadoopdeploy/.Trash/Current/bigdata/hadoop/2.txt
[hadoopdeploy@hadoop01 ~]$

6. Interface operation

Some people may say, how to remember so many commands, if we can operate the hdfs interface, we can operate on the interface.

interface operation

7. Reference link

1、https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile

Guess you like

Origin blog.csdn.net/fu_huo_1993/article/details/129250176