1. Introduction to HDFS and its basic concepts
HDFS (Hadoop Distributed File System) is an important part of the hadoop ecosystem. It is a storage component in hadoop. It has an extraordinary position in the entire Hadoop and is the most basic part, because it involves data storage, MapReduce and other calculations. All models depend on data stored in HDFS. HDFS is a distributed file system that stores very large files in a streaming data access mode, and stores the data in blocks on different machines in a commercial hardware cluster.
2. Experimental environment
Operating system: Ubuntu64 bit
Hadoop version: Hadoop 2.7.1
Jdk version: jdk-8u241-linux-x64
3. Practice content
Note: At the
beginning of the Shell command mode, there are three shell command modes:
- hadoop fs # hadoop fs is applicable to any different file system, such as local file system and HDFS file system
- hadoop dfs # hadoop dfs can only be applied to HDFS file system
- hdfs dfs #hdfs dfs has the same function as hadoop dfs, and it can only be applied to HDFS file system
1. Before using HDFS command line operation, you need to start Hadoop first
command:
cd /usr/local/hadoop
./sbin/start-dfs.sh #启动hadoop
practice:
2. Hadoop supports many Shell commands, among which fs is the most commonly used command for HDFS. With fs, you can view the directory structure of the HDFS file system, upload and download data, and create files.
command:
hadoop fs #查看fs总共支持了哪些命令
practice:
3. View the role of a specific command
command:
hadoop fs -help put#查看put命令的作用
practice:
4. Use Shell commands to interact with HDFS
(1) Directory operation
① When using HDFS for the first time, you need to create a user directory in HDFS first.
command:
cd /usr/local/hadoop
hdfs dfs –mkdir –p /user/Hadoop #在HDFS中创建一个“/user/hadoop”目录,“–mkdir”是创建目录的操作,“-p”表示如果是多级目录,则父目录和子目录一起创建,这里“/user/hadoop”就是一个多级目录,因此必须使用参数“-p”,否则会出错。
practice:
②Display the contents of the user directory corresponding to the current user hadoop in HDFS
command:
hdfs dfs –ls. #“-ls”表示列出HDFS某个目录下的所有内容,“.”表示HDFS中的当前用户目录,也就是“/user/hadoop”目录
practice:
Equivalent command:
hdfs dfs –ls /user/Hadoop
practice:
③List all directories on HDFS
command:
hdfs dfs –ls
practice:
④ Create an input directory
command:
hdfs dfs –mkdir input #创建此input目录时,采用了相对路径形式,实际上,这个input目录创建成功以后,它在HDFS中的完整路径是“/user/hadoop/input”
practice:
hdfs dfs –mkdir /input #在HDFS的根目录下创建一个名称为input的目录
practice:
⑤ Use the rm command to delete a directory
command:
hdfs dfs –rm –r /input #删除刚才在HDFS中创建的“/input”目录(不是“/user/hadoop/input”目录);“-r”参数表示删除“/input”目录及其子目录下的所有内容,如果要删除的一个目录包含了子目录,则必须使用“-r”参数,否则会执行失败。
practice:
(2) File operation
①Create file
command:
cd /usr/local/hadoop
touch mylocalfile.txt
practice:
②Edit the file
command:
vim mylocalfile.txt
practice:
③Upload the files of the local file system to the current user directory in HDFS
command:
hdfs dfs -put /usr/local/hadoop/myLocalFile.txt input #把本地文件系统的“/usr/local/hadoop/myLocalFile.txt”上传到HDFS中的当前用户目录的input目录下,也就是上传到HDFS的“/user/hadoop/input/”目录下。
practice:
④View the contents of files in HDFS
Code:
hdfs dfs –cat input/myLocalFile.txt
practice:
⑤Download files in HDFS to local
Code:
hdfs dfs -get input/myLocalFile.txt/home/hadoop/下载 #把HDFS中的myLocalFile.txt文件下载到本地文件系统中的“/home/hadoop/下载/”这个目录下
practice:
Go to the local file system to view the downloaded file mylocalfile.txt
command:
cd ~
cd 下载
ls
cat myLocalFile.txt
practice:
⑥ Copy files from one directory in HDFS to another directory in HDFS
command:
hdfs dfs -cp input/myLocalFile.txt/input #把HDFS的“/user/hadoop/input/myLocalFile.txt”文件,拷贝到HDFS的另外一个目录“/input”中(注意,这个input目录位于HDFS根目录下)
(3) Other commonly used shell commands
① Output this command parameter manual
command:
Hdfs dfs -help 命令名
Examples:
Hdfs dfs -help cat
②Upload files
command:
hdfs dfs -put file(本地文件路径) /hdfsfile(hdfs文件路径)
Examples:
hdfs dfs -put /usr/local/hadoop/myLocalFile.txt input
③Cut file
command:
hdfs dfs -moveFromLocal file(文件路径) /hdfsfile(hdfs文件路径)
Examples:
hdfs dfs -moveFromLocal /usr/local/hadoop/myLocalFile.txt input
④Download the file to the local
command:
hdfs dfs -get /hdfsfile(hdfs文件路径) file(本地路径)
Examples:
hdfs dfs -get input/mylocalfile.txt /usr/local/hadoop
⑤Merge download
command:
hdfs dfs -getmerge /hdfsdir file
Examples:
hdfs dfs -get input /usr/local/hadoop
⑥Create folder
command:
hdfs dfs -mkdir /dirname
Examples:
Multi-level mandatory creation commands:
hdfs dfs -mkdir -p /dirname/dirname
Examples:
⑦Move folder
command:
hdfs dfs -mv /hdfsfile /hdfsdir/hdfsfile
# -mv a.txt b.txt //重命名
# -mv a.txt / //移动
⑧Copy
command:
hdfs dfs -cp /hdfsfile /hdfsdir
⑨Delete
command:
hdfs dfs -rm /hdfsfile
Delete folder command:
hdfs dfs -rm -r /hdfsdir
⑩View
command:
hdfs dfs -cat /hdfsfile
hdfs dfs -tail -f /hdfsfile
See how many files are in the folder
command:
hdfs dfs -count /hdfsdir
View the total space of hdfs
command:
hdfs dfs -df /
hdfs dfs -df -h / 可读性更高