[Hadoop] 02-HDFS basic operation

HDFS is a file system used to store files and locate files through a unified namespace—directory tree. Secondly, it is distributed, and many servers are combined to realize its functions. The servers in the cluster have their own roles; The
important features are as follows:
(1) The files in HDFS are physically stored in blocks. The size of the block can be specified by the configuration parameter (dfs.blocksize). The default size is 128M in the hadoop2.x version, and the old version Medium is 64M
(2) The HDFS file system will provide a unified abstract directory tree for the client, and the client accesses the file through the path, such as: hdfs://namenode:port/dir-a/dir-b/dir- c/file.data
(3) The management of the directory structure and file block information (metadata) is undertaken by the namenode node - the namenode is the master node of the HDFS cluster, responsible for maintaining the directory tree of the entire hdfs file system, as well as each path (file ) corresponding block block information (block id, and datanode server where it is located)
(4) The storage management of each block of the file is undertaken by the datanode node---- datanode is the slave node of the HDFS cluster, and each block can be stored in multiple Multiple copies are stored on each datanode (the number of copies can also be set by parameter dfs.replication)
(5) HDFS is designed to adapt to the scenario of one write and multiple read, and does not support file modification

(Note: It is suitable for data analysis, not suitable for network disk applications, because it is inconvenient to modify, delays are large, network overhead is high, and the cost is too high)

HDFS provides shell command line client, the usage is as follows: ( official document )

Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-v] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-head <file>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are:
-conf <configuration file>        specify an application configuration file
-D <property=value>               define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port>  specify a ResourceManager
-files <file1,...>                specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...>               specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...>          specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:

command [genericOptions] [commandOptions]

Introduction to common command parameters

-help            

Function: output this command parameter manual

-ls                 

Function: Display directory information

Example: hadoop fs -ls hdfs://hadoop-server01:9000/

Note: Among these parameters, all hdfs paths can be abbreviated

--> hadoop fs -ls /   is equivalent to the effect of the previous command

-mkdir             

Function: create directory on hdfs

Example: hadoop fs  -mkdir -p /aaa/bbb/cc/dd

-moveFromLocal           

Function: cut and paste from local to hdfs

Example: hadoop  fs - moveFromLocal /home/hadoop/a.txt /aaa/bbb/cc/dd

-moveToLocal             

Function: cut and paste from hdfs to local

Example: hadoop  fs - moveToLocal /aaa/bbb/cc/dd /home/hadoop/a.txt

--appendToFile 

Function: Append a file to the end of an existing file

示例:hadoop  fs  -appendToFile  ./hello.txt  hdfs://hadoop-server01:9000/hello.txt

Can be abbreviated as:

Hadoop  fs  -appendToFile  ./hello.txt  /hello.txt

 

-cat 

Function: Display file content 

Example: hadoop fs -cat  /hello.txt

 

-tail                

Function: Display the end of a file

示例:hadoop  fs  -tail  /weblog/access_log.1

-text                 

功能:以字符形式打印一个文件的内容

示例:hadoop  fs  -text  /weblog/access_log.1

-chgrp

-chmod

-chown

功能:linux文件系统中的用法一样,对文件所属权限

示例:

hadoop  fs  -chmod  666  /hello.txt

hadoop  fs  -chown  someuser:somegrp   /hello.txt

-copyFromLocal   

功能:从本地文件系统中拷贝文件到hdfs路径去

示例:hadoop  fs  -copyFromLocal  ./jdk.tar.gz  /aaa/

-copyToLocal     

功能:从hdfs拷贝到本地

示例:hadoop fs -copyToLocal /aaa/jdk.tar.gz

-cp             

功能:从hdfs的一个路径拷贝hdfs的另一个路径

示例: hadoop  fs  -cp  /aaa/jdk.tar.gz  /bbb/jdk.tar.gz.2

 

-mv                    

功能:在hdfs目录中移动文件

示例: hadoop  fs  -mv  /aaa/jdk.tar.gz  /

-get             

功能:等同于copyToLocal,就是从hdfs下载文件到本地

示例:hadoop fs -get  /aaa/jdk.tar.gz

-getmerge            

功能:合并下载多个文件

示例:比如hdfs的目录 /aaa/下有多个文件:log.1, log.2,log.3,...

hadoop fs -getmerge /aaa/log.* ./log.sum

-put               

功能:等同于copyFromLocal

示例:hadoop  fs  -put  /aaa/jdk.tar.gz  /bbb/jdk.tar.gz.2

 

-rm               

功能:删除文件或文件夹

示例:hadoop fs -rm -r /aaa/bbb/

 

-rmdir                

功能:删除空目录

示例:hadoop  fs  -rmdir   /aaa/bbb/ccc

-df              

功能:统计文件系统的可用空间信息

示例:hadoop  fs  -df  -h  /

 

-du

功能:统计文件夹的大小信息

示例:

hadoop  fs  -du  -s  -h /aaa/*

 

-count        

功能:统计一个指定目录下的文件节点数量

示例:hadoop fs -count /aaa/

 

-setrep               

功能:设置hdfs中文件的副本数量

示例:hadoop fs -setrep 3 /aaa/jdk.tar.gz

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324584373&siteId=291194637