HDFS basic shell operations
- 1.1 Create a directory
- 1.2 Upload command
- 1.3 Create an empty file
- 1.4 Appending content to files in the distributed file system
- 1.5 View commands
- 1.6 Download command
- 1.7 Combined downloads
- 1.8 Move files in hdfs
- 1.9 Copy files in hdfs to another directory of hdfs
- 1.10 Delete command
- 1.11 View disk utilization and file size
- 1.12 Modify permissions
- 1.13 Modify the number of copies of a file
- 1.14 Check the status of the file
- 1.15 Testing
1.1 Create a directory
Call format:
hdfs dfs -mkdir (-p) /目录
For example:
hdfs dfs -mkdir /data
hdfs dfs -mkdir -p /data/a/b/c
1.2 Upload command
Call format:
hdfs dfs -put /本地文件 /分布式文件系统路径
Note: Writing directly to / is to omit the name of the filesystem hdfs://ip:port.
For example:
hdfs dfs -put /root/a.txt /data/ # root 下的 a.txt 上传到 /data
hdfs dfs -put /root/logs/* /data/ # logs 下的所有文件上传到 /data
1.3 Create an empty file
Call format:
hdfs dfs -touchz /系统路径/文件名
For example:
hdfs dfs -touchz /hadooptest.txt
1.4 Appending content to files in the distributed file system
Call format:
hdfs dfs -appendToFile 本地文件 hdfs上的文件
注意:
1)不支持在中间随意增删改操作
2)往空文件中追加就相当于直接写文件,所以能追加进去
The content of the local file hello1.txt is appended to the end of the hdfs file hello.txt as follows:
Original hello.txt: hello world
Original hello1.txt: hello
The content of the appended hello.txt is as follows
1.5 View commands
View the contents of the directory in the distributed file system
调用格式:hdfs dfs -ls /
View the file content of the distributed file system
调用格式:hdfs dfs -cat /xxx.txt
View the file content of the distributed file system
调用格式:hdfs dfs -tail /xxx.txt
注意:默认最多查看1000行
1.6 Download command
hdfs dfs -copyToLocal hdfs上的文件 本地路径
注意:本地路径的文件夹可以不存在
hdfs dfs -moveToLocal hdfs上的文件 本地路径
注意:从hdfs的某个路径将数据剪切到本地,已经被遗弃了
hdfs dfs -get hdfs上的文件 本地路径
调用格式:同copyToLoca
1.7 Combined downloads
调用格式:hdfs dfs -getmerge hdfs上面的路径 本地的路径
实例:hdfs dfs -getmerge /data/*.txt /root/c.txt(将hdfs上的a.txt和b.txt文件合并为c.txt保存在本地root目录下)
1.8 Move files in hdfs
调用格式:hdfs dfs -mv /hdfs的路径1 /hdfs的另一个路径2
实例:hfds dfs -mv /aaa /bbb 这里是将aaa整体移动到bbb中
The original directory is as follows:
The directory after moving hello1 in the root directory to /data/ is as follows
1.9 Copy files in hdfs to another directory of hdfs
调用格式:hdfs dfs -cp 原路径 想要复制到的路径
1.10 Delete command
hfds dfs -rm [-f] [-r|-R] [-skipTrash] <src> ...
注意:如果删除文件夹需要加-r
hfds dfs -rmdir [--ignore-fail-on-non-empty] <dir> ...
注意:必须是空文件夹,如果非空必须使用rm删除
1.11 View disk utilization and file size
hfds dfs -df [-h] [<path> ...]] #查看分布式系统的磁盘使用情况
hfds dfs -du [-s] [-h] <path> ... #查看分布式系统上当前路径下文件的情况 -h:human 以人类可读的方式显示
1.12 Modify permissions
Consistent with the local operation, -R is to modify the subdirectory or file accordingly
hfds dfs -chgrp [-R] GROUP PATH...
hfds dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...
hfds dfs -chown [-R] [OWNER][:[GROUP]] PATH...
1.13 Modify the number of copies of a file
调用格式:hadoop fs -setrep 3 / 将hdfs根目录及子目录下的内容设置成3个副本
注意:当设置的副本数量与初始化时默认的副本数量不一致时,集群会作出反应,比原来多了会自动进行复制.
1.14 Check the status of the file
Function of the command: When writing a file to hdfs, the block size of the file can be set through the dfs.blocksize configuration item. This results in different file block sizes on hdfs being different. Sometimes you want to know the block size of a file on hdfs, you can pre-estimate the number of calculated tasks. The meaning of stat: You can view some attributes of the file.
调用格式:hdfs dfs -stat [format] 文件路径
format的形式:
%b:打印文件的大小(目录大小为0)
%n:打印文件名
%o:打印block的size
%r:打印副本数
%y:utc时间 yyyy-MM-dd HH:mm:ss
%Y:打印自1970年1月1日以来的utc的微秒数
%F:目录打印directory,文件打印regular file
注意:
# 当使用-stat命令但不指定format时,只打印创建时间,相当于%y
# -stat 后面只跟目录,%r,%o等打印的都是0,只有文件才有副本和大小
1.15 Testing
参数说明:
-e:文件是否存在 存在返回0
-z:文件是否为空 为空返回0
-d:是否是路径(目录) ,是返回0
调用格式:hdfs dfs -test -d 文件
实例:hdfs dfs -test -d /data/hello.txt && echo "OK" || echo "no"
解释:测试当前的内容是否是文件夹 ,如果是返回ok,如果不是返回no