[Hadoop] 02-HDFS basic operation

HDFS is a file system used to store files and locate files through a unified namespace—directory tree. Secondly, it is distributed, and many servers are combined to realize its functions. The servers in the cluster have their own roles; The
important features are as follows:
(1) The files in HDFS are physically stored in blocks. The size of the block can be specified by the configuration parameter (dfs.blocksize). The default size is 128M in the hadoop2.x version, and the old version Medium is 64M
(2) The HDFS file system will provide a unified abstract directory tree for the client, and the client accesses the file through the path, such as: hdfs://namenode:port/dir-a/dir-b/dir- c/file.data
(3) The management of the directory structure and file block information (metadata) is undertaken by the namenode node - the namenode is the master node of the HDFS cluster, responsible for maintaining the directory tree of the entire hdfs file system, as well as each path (file ) corresponding block block information (block id, and datanode server where it is located)
(4) The storage management of each block of the file is undertaken by the datanode node---- datanode is the slave node of the HDFS cluster, and each block can be stored in multiple Multiple copies are stored on each datanode (the number of copies can also be set by parameter dfs.replication)
(5) HDFS is designed to adapt to the scenario of one write and multiple read, and does not support file modification

(Note: It is suitable for data analysis, not suitable for network disk applications, because it is inconvenient to modify, delays are large, network overhead is high, and the cost is too high)

HDFS provides shell command line client, the usage is as follows: ( official document )

Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
[-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...]
[-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] [-v] [-x] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
[-head <file>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are:
-conf <configuration file> specify an application configuration file
-D <property=value> define a value for a given property
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations.
-jt <local|resourcemanager:port> specify a ResourceManager
-files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster
-libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath
-archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines
The general command line syntax is:

command [genericOptions] [commandOptions]

Introduction to common command parameters

-help

Function: output this command parameter manual

-ls

Function: Display directory information

Example: hadoop fs -ls hdfs://hadoop-server01:9000/

Note: Among these parameters, all hdfs paths can be abbreviated

--> hadoop fs -ls / is equivalent to the effect of the previous command

-mkdir

Function: create directory on hdfs

Example: hadoop fs -mkdir -p /aaa/bbb/cc/dd

-moveFromLocal

Function: cut and paste from local to hdfs

Example: hadoop fs - moveFromLocal /home/hadoop/a.txt /aaa/bbb/cc/dd

-moveToLocal

Function: cut and paste from hdfs to local

Example: hadoop fs - moveToLocal /aaa/bbb/cc/dd /home/hadoop/a.txt

--appendToFile

Function: Append a file to the end of an existing file

示例：hadoop fs -appendToFile ./hello.txt hdfs://hadoop-server01:9000/hello.txt

Can be abbreviated as:

Hadoop fs -appendToFile ./hello.txt /hello.txt

-cat

Function: Display file content

Example: hadoop fs -cat /hello.txt

-tail

Function: Display the end of a file

示例：hadoop fs -tail /weblog/access_log.1

-text

功能：以字符形式打印一个文件的内容

示例：hadoop fs -text /weblog/access_log.1

-chgrp

-chmod

-chown

功能：linux文件系统中的用法一样，对文件所属权限

示例：

hadoop fs -chmod 666 /hello.txt

hadoop fs -chown someuser:somegrp /hello.txt

-copyFromLocal

功能：从本地文件系统中拷贝文件到hdfs路径去

示例：hadoop fs -copyFromLocal ./jdk.tar.gz /aaa/

-copyToLocal

功能：从hdfs拷贝到本地

示例：hadoop fs -copyToLocal /aaa/jdk.tar.gz

-cp

功能：从hdfs的一个路径拷贝hdfs的另一个路径

示例： hadoop fs -cp /aaa/jdk.tar.gz /bbb/jdk.tar.gz.2

-mv

功能：在hdfs目录中移动文件

示例： hadoop fs -mv /aaa/jdk.tar.gz /

-get

功能：等同于copyToLocal，就是从hdfs下载文件到本地

示例：hadoop fs -get /aaa/jdk.tar.gz

-getmerge

功能：合并下载多个文件

示例：比如hdfs的目录 /aaa/下有多个文件:log.1, log.2,log.3,...

hadoop fs -getmerge /aaa/log.* ./log.sum

-put

功能：等同于copyFromLocal

示例：hadoop fs -put /aaa/jdk.tar.gz /bbb/jdk.tar.gz.2

-rm

功能：删除文件或文件夹

示例：hadoop fs -rm -r /aaa/bbb/

-rmdir

功能：删除空目录

示例：hadoop fs -rmdir /aaa/bbb/ccc

-df

功能：统计文件系统的可用空间信息

示例：hadoop fs -df -h /

-du

功能：统计文件夹的大小信息

示例：

hadoop fs -du -s -h /aaa/*

-count

功能：统计一个指定目录下的文件节点数量

示例：hadoop fs -count /aaa/

-setrep

功能：设置hdfs中文件的副本数量

示例：hadoop fs -setrep 3 /aaa/jdk.tar.gz

[Hadoop] 02-HDFS basic operation

Introduction to common command parameters

Guess you like