hadoop self-study diary-3.hadoop hdfs operation
Build environment
I use a simple windows 7 notebook and use VirtualBox to create a Centos virtual machine to install Hadoop
VirtualBox:6.0.8 r130520 (Qt5.6.2)
CentOS:CentOS Linux release 7.6.1810 (Core)
jdk:1.8.0_202
hadoop:2.6.5
Cluster environment
- A master node, serving as NameNode in hdfs and ResourceManager in yarn.
- Three data nodes, data1, data2, and data3, act as DataNode in hdfs and NodeManager in yarn.
name | ip | hdfs | yarn |
---|---|---|---|
master | 192.168.37.200 | NameNode | ResourceManager |
data1 | 192.168.37.201 | DataNode | NodeManager |
data2 | 192.168.37.202 | DataNode | NodeManager |
data3 | 192.168.37.203 | DataNode | NodeManager |
Basic commands
Hdfs operation has two syntax, one is: hadoop fs
This command is not only operable hdfs, other file systems can also be used; the other is: hdfs dfs
specifically for hdfs distributed file system.
Here I use the wider range as hadoop fs
an example
1. View the catalog
Use the -ls
command to view:
[root@master ~]# hadoop fs -ls /
2. Create a directory
Use the '' '-mkdir' '' command to create a directory "
[root@master ~]# hadoop fs -mkdir /rawdata
View the files in the directory:
[root@master ~]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2019-07-25 10:43 /rawdata
Create a multi-level directory:
[root@master ~]# hadoop fs -mkdir -p /a/b/c
View all subdirectories:
[root@master ~]# hadoop fs -ls -R /
drwxr-xr-x - root supergroup 0 2019-07-25 11:02 /a
drwxr-xr-x - root supergroup 0 2019-07-25 11:02 /a/b
drwxr-xr-x - root supergroup 0 2019-07-25 11:02 /a/b/c
drwxr-xr-x - root supergroup 0 2019-07-25 10:43 /rawdata
3. Upload local files
There are two kinds of syntax: -put
and -copyFromLocal
difference: use -put
, if the file already exists, the system will not display the file already exists, but will directly overwrite; and use -copyFromLocal
, if the file already exists, the system will prompt that the file already exists, the upload fails. And -put
can accept standard input. Take the
following as -put
an example:
- Upload local files to the specified directory of hdfs:
[root@master ~]# hadoop fs -put /software/hadoop-2.6.5.tar.gz /a/b/c/
[root@master ~]# hadoop fs -ls /a/b/c/
Found 1 items
-rw-r--r-- 3 root supergroup 199635269 2019-07-25 11:35 /a/b/c/hadoop-2.6.5.tar.gz
- Upload multiple files at once:
[root@master ~]# hadoop fs -put /home/a /home/b /home/c /home/d /a/b/c
[root@master ~]# hadoop fs -ls /a/b/c
Found 5 items
-rw-r--r-- 3 root supergroup 0 2019-07-25 11:38 /a/b/c/a
-rw-r--r-- 3 root supergroup 0 2019-07-25 11:38 /a/b/c/b
-rw-r--r-- 3 root supergroup 0 2019-07-25 11:38 /a/b/c/c
-rw-r--r-- 3 root supergroup 0 2019-07-25 11:38 /a/b/c/d
-rw-r--r-- 3 root supergroup 199635269 2019-07-25 11:35 /a/b/c/hadoop-2.6.5.tar.gz
- Standard input:
The results of the ls command can be directly transferred to the hdfs file
[root@master ~]# ls /software/hadoop-2.6.5/ |hadoop fs -put - /a/b/c/h.txt
Use to -cat
view file content:
[root@master ~]# hadoop fs -cat /a/b/c/h.txt
bin
etc
hadoop_data
hdfs
include
lib
libexec
LICENSE.txt
logs
NOTICE.txt
README.txt
sbin
share
3. Download hdfs file
There are 2 kinds of syntax: -get
and-copyToLocal
-get
Take the following as an example:
Download the files on hdfs to the local:
[root@master ~]# hadoop fs -get /a/b/c/a ./
[root@master ~]# ll
total 4
-rw-r--r--. 1 root root 0 Jul 25 14:39 a
-rw-------. 1 root root 1204 Jul 17 14:49 anaconda-ks.cfg
4. Copy and delete hdfs files
Use the -cp
command to copy the hdfs file, use the -rm
command to delete the hdfs file.
-cp
The command can copy the contents of an entire folder:
[root@master ~]# hadoop fs -cp /a/b/c /rawdata
[root@master ~]# hadoop fs -ls -R /rawdata
drwxr-xr-x - root supergroup 0 2019-07-25 14:45 /rawdata/c
-rw-r--r-- 3 root supergroup 0 2019-07-25 14:44 /rawdata/c/a
-rw-r--r-- 3 root supergroup 0 2019-07-25 14:44 /rawdata/c/b
-rw-r--r-- 3 root supergroup 0 2019-07-25 14:44 /rawdata/c/c
-rw-r--r-- 3 root supergroup 0 2019-07-25 14:44 /rawdata/c/d
-rw-r--r-- 3 root supergroup 95 2019-07-25 14:44 /rawdata/c/h.txt
-rw-r--r-- 3 root supergroup 199635269 2019-07-25 14:45 /rawdata/c/hadoop-2.6.5.tar.gz
-rm
The command can delete the hdfs file. If you need to delete the folder, you must add -R
:
[root@master ~]# hadoop fs -rm -R /rawdata/*
19/07/25 14:49:08 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /rawdata/c
[root@master ~]# hadoop fs -ls -R /rawdata
Use graphical interface to operate hdfs
Need to use the web to open hdfs management URL:
http://10.11.91.122:50070/
After entering the homepage, select browse the file system:
enter the directory path in the browser directory to see the files under the path:
click the file name connection to see the detailed status of the file and provide download: