hadoop (eleven) HDFS Introduction and Common Commands

HDFS background

With the increase of the amount of data in a memory, not the operating system, you need to be allocated to the operating system's disk management, but it is not convenient maintenance manager, urgent need for a system to manage files on a multi-state machine, this is a distributed file system management.

HDFS concept

HDFS English hadoop distributed file system, is a distributed file system for storing files, locate files recorded through the directory tree, and secondly he is distributed by a number of servers together to achieve its function, the cluster each server role.
HDFS is designed for once inhaled, the scene read many times, and does not support modifying files. Suitable for data analysis.

HDFS advantages and disadvantages

advantage

1) high fault tolerance
(1) multiple copies of data is automatically saved. It increased by a copy of the form, to improve fault tolerance;
(2) a copy of a later lost, it can be automatically restored.
2) suitable for large data processing
(1) Data Scale: capable of processing data reached GB, TB, even PB-level data;
(2) File Size: capable of handling more than one million the number of file size, the number is quite large.
3) access streaming data, it can ensure data consistency
4) can be constructed in an inexpensive machine, by multiple copies reliability is improved.

Shortcoming

1) is not suitable for low-latency access to data, such as storing data millisecond, it is impossible.
2) can not be efficient for a large number of small files are stored.
(1) storing lots of small files, then it will take Namenode a lot of memory to store files, directories, and block information. This is undesirable because Namenode memory is always limited;
(2) small files stored seek time to read more than the same time, it violates the design goal of HDFS
3) does not support concurrent write, modify files randomly .
(1) a file can have only one write, do not allow multiple threads to write;
(2) supports only data append (additional), does not support the random modification of files

Architecture

image.png
image.png

 

This architecture consists of three parts, namely the Namenode, Datanode and SecondaryNamenode.

Namenode:

Equivalent to Master, it is a supervisor, manager.
(1) Name Space Management of HDFS;
(2) management block (Block) mapping information;
(3) policy configuration copies;
(4) the client process read and write requests.

DataNode:

Is the Slave, Namenode orders, Datanode perform the actual operation.
(1) storing the actual data block;
(2) the data block read / write operations.

Secondary Namenode:

Namenode not hot spare. When Namenode hang, it does not immediately replace Namenode and service
(1) assist Namenode, sharing the workload
(2) on a regular basis and consolidated Fsimage Edits, and pushed to Namenode
(3) In case of emergency, can aid recovery Namenode

HDFS file block size

HDFS files are physically block storage (block), the block size can be specified by the configuration parameters (dfs blocksize), the default size is 128M in hadoop2x version, the old version is 64M
thinking: Why block size can not be set too small, can not be set too high?
HDFS block larger than the disk block, its purpose is to minimize the addressing overhead. If the block is set large enough, the data from the disk transfer time will be significantly greater than the time required for positioning the block start position. Thus, depending on the time of transmission of a document composed of a plurality of blocks of disk transfer rate if the addressing time is about 10ms, and the transmission rate 10OMbs, in order to make the addressing time of only 1% of the transmission time, we want to block size set approximately 100MB. The default block size is 128MB. Block size: 10ms + 100 * 100Ms = 100M , shown in Figure 3-2

Hadoop command operations

[shaozhiqi@hadoop102 ~]$ cd /opt/module/
[shaozhiqi@hadoop102 module]$ cd hadoop-3.1.2/ [shaozhiqi@hadoop102 hadoop-3.1.2]$ ls bin include lib LICENSE.txt output sbin wcinput etc input libexec NOTICE.txt README.txt share wcoutput 

View help

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop –help
 Client Commands:
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries conftest validate configuration XML files credential interact with credential providers dtutil operations related to delegation tokens envvars display computed Hadoop environment variables fs run a generic filesystem user client jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command. jnipath prints the java.library.path kdiag Diagnose Kerberos Problems kerbname show auth_to_local principal conversion key manage keys via the KeyProvider trace view and modify Hadoop tracing settings version print the version 

Fs see what are commands

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs
Usage: hadoop fs [generic options]
 [-appendToFile <localsrc> ... <dst>]
 [-cat [-ignoreCrc] <src> ...] [-checksum <src> ...] [-chgrp [-R] GROUP PATH...] [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...] [-chown [-R] [OWNER][:[GROUP]] PATH...] [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>] [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] <path> ...] [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>] [-createSnapshot <snapshotDir> [<snapshotName>]] [-deleteSnapshot <snapshotDir> <snapshotName>] [-df [-h] [<path> ...]] [-du [-s] [-h] [-v] [-x] <path> ...] [-expunge] [-find <path> ... <expression> ...] [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>] [-getfacl [-R] <path>] [-getfattr [-R] {-n name | -d} [-e en] <path>] [-getmerge [-nl] [-skip-empty-file] <src> <localdst>] [-head <file>] [-help [cmd ...]] [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]] [-mkdir [-p] <path> ...] [-moveFromLocal <localsrc> ... <dst> [-moveToLocal <src> <localdst>] [-mv <src> ... <dst>] [-put [-f] [-p] [-l] [-d] <localsrc> ... <dst>] [-renameSnapshot <snapshotDir> <oldName> <newName>] [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...] [-rmdir [--ignore-fail-on-non-empty] <dir> ...] [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]] [-setfattr {-n name [-v value] | -x name} <path>] [-setrep [-R] [-w] <rep> <path> ...] [-stat [format] <path> ...] [-tail [-f] <file>] [-test -[defsz] <path>] [-text [-ignoreCrc] <src> ...] [-touch [-a] [-m] [-t TIMESTAMP ] [-c] <path> ...] [-touchz <path> ...] [-truncate [-w] <length> <path> ...] [-usage [cmd ...]] Generic options supported are: -conf <configuration file> specify an application configuration file -D <property=value> define a value for a given property -fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations. -jt <local|resourcemanager:port> specify a ResourceManager -files <file1,...> specify a comma-separated list of files to be copied to the map reduce cluster -libjars <jar1,...> specify a comma-separated list of jar files to be included in the classpath -archives <archive1,...> specify a comma-separated list of archives to be unarchived on the compute machines The general command line syntax is: command [genericOptions] [commandOptions] 

Check out our HDFS directory information -ls, found that failure to start our hadoop cluster

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -ls
ls: Call From hadoop102/192.168.1.102 to hadoop102:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused [shaozhiqi@hadoop102 hadoop-3.1.2]$ 

View directory -ls again

We are starting a new cluster has not uploaded files and create directories

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -ls
ls: `.': No such file or directory
[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -ls /
[shaozhiqi@hadoop102 hadoop-3.1.2]$

Create a folder -mkdir

-p marked recursively created, with multilayer

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -mkdir -p /shaozhiqi/temp
[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -ls /
Found 1 items drwxr-xr-x - shaozhiqi supergroup 0 2019-06-29 19:40 /shaozhiqi [shaozhiqi@hadoop102 hadoop-3.1.2]$ 

The cut and paste local files to hdfs -moveFromLocal

[shaozhiqi@hadoop102 hadoop-3.1.2]$ vim test.txt
[shaozhiqi@hadoop102 hadoop-3.1.2]$
[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -moveFromLocal test.txt /shaozhiqi/temp [shaozhiqi@hadoop102 hadoop-3.1.2]$ 

To see if the upload was successful

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -ls -r /shaozhiqi/temp
Found 1 items
-rw-r--r-- 3 shaozhiqi supergroup 18 2019-06-29 19:48 /shaozhiqi/temp/test.txt [shaozhiqi@hadoop102 hadoop-3.1.2]$ 

Test.txt file in the end of the additional content -appendToFile

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs –appendToFIle test2.txt /shapzhiqi/temp/test.txt

View the file contents -cat

[shaozhiqi@hadoop102 hadoop-3.1.2]$ hadoop fs -cat /shaozhiqi/temp/test.txt
tetete
sdfd

ddd
[shaozhiqi@hadoop102 hadoop-3.1.2]$

Other commonly used commands subsequent re-add it

-tail: appears at the end of a file
-chgrp, -chmod, -chown: linux and modify the same file ownership
-copyFromLocal: from the local copy files to HDFS up
-copyToLocal: copy from hdfs to local
-cp: a from of HDFS to HDFS copy Road King Road King another
-mv: move files in HDFS
-get: equivalents -copyToLocal
-getmerge: merging a plurality of files to download

Guess you like

Origin www.cnblogs.com/luotuoke/p/11534875.html