hadoop hdfs learning (a)

A, HDFS and the advantages and disadvantages of Introduction

  HDFS (Hadoop Distributed File System) is an important part hadoop ecosystem is hadoop the storage components, the position in the Hadoop unusual, most part of the foundation, as it relates to data storage, and so the MapReduce calculation model must rely on the data stored in the HDFS. HDFS is a distributed file system to store large files streaming data access mode, the memory block to the data on different machines in a cluster commercial hardware.

  Here we focus on several concepts which relate to: (1) large files . The current hadoop cluster can store hundreds of TB or even PB-level data. (2) the streaming data access . HDFS access mode is: write once, read many times , more attention is to read the entire data set of the whole time. (3) commodity hardware. HDFS clusters of how expensive equipment and does not require special hardware can be as long as some ordinary everyday use, and as such, the possibility of hdfs node failure is very high, so there must be mechanisms to deal with this single point of failure , a reliable guarantee for the data. (4) does not support the low latency data access time . hdfs concern is the high data throughput, not suitable for those applications that require low latency data access time. (5) single-user writing is not supported any modification. hdfs read-mostly data, only supports a single writer, and write operations are always added in the form of the text is added at the end, in the modification is not supported anywhere.

 

 

 

 

 

 Two, HDFS operations in Shell

  bin / hadoop fs particular command or   bin / hdfs dfs specific commands

  The following are a few commonly used commands

  

( 0 ) Start Hadoop cluster (to facilitate subsequent test)

  

$ sbin/start-dfs.sh
$ sbin/start-yarn.sh

 

( . 1 ) -help : output of this command parameter

 

$ hadoop fs -help rm

 

 

 

( 2 ) -ls: display directory information

 

$ Hadoop fs -Ls /

 

 

 

( 3 ) -mkdir : In the Create a directory on HDFS

 

$ Hadoop fs -mkdir -p / sanguo / shuguo

 

 

 

( . 4 ) -moveFromLocal : Paste from the local cut to HDFS

Touch kongming.txt $ 
$ hadoop FS   -moveFromLocal ./kongming.txt / sanguo / shuguo

( 5 ) -appendToFile : Append a file to the end of the file already exists

Touch liubei.txt $ 
$ vi liubei.txt

Entry

san gu mao lu

 

carried out

$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt

 

( 6 ) -cat : display file contents

$ hadoop fs -cat /sanguo/shuguo/kongming.txt

(7 ) -chgrp , -chmod , -chown : Linux as file system usage permissions to modify the file belongs

$ hadoop fs  -chmod  666  /sanguo/shuguo/kongming.txt

$ hadoop fs  -chown  atguigu:atguigu   /sanguo/shuguo/kongming.txt

(. 8 ) -copyFromLocal : from the local file system to copy files to HDFS path

$ hadoop fs -copyFromLocal README.txt /

(. 9 ) -copyToLocal : from copy to the local HDFS

 

$ hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./

 

( 10 ) -cp : from copy HDFS path to another path of HDFS

 

$ hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt

 

( 11 ) -mv : in moving files in HDFS directory

$ Hadoop fs -mv /zhuge.txt / sanguo / shuguo /

( 12 ) -get : equivalent to copyToLocal , that is, from HDFS download the file to a local

 

$ hadoop fs -get /sanguo/shuguo/kongming.txt ./

 

( 13 ) -getmerge : Merge download multiple files, such as HDFS directory / user / atguigu / test multiple files : log.1, log.2, log.3, ...

 

$ hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt

 

 

 

( 14 ) -put : equivalent to copyFromLocal

$ hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/

 

(15 ) -tail : appears at the end of a file

$ hadoop fs -tail /sanguo/shuguo/kongming.txt

 

( 16 ) -rm : delete a file or folder

$ hadoop fs -rm /user/atguigu/test/jinlian2.txt

 

( 17 ) -rmdir : Delete empty directory

$ hadoop fs -mkdir /test

$ hadoop fs -rmdir /test

 

( 18 ) -du size information folder statistics

$ hadoop fs -du -s -h /user/atguigu/test

2.7 K  /user/atguigu/test

$ hadoop fs -du  -h /user/atguigu/test

1.3 K  /user/atguigu/test/README.txt

15     /user/atguigu/test/jinlian.txt

1.4 K  /user/atguigu/test/zaiyiqi.txt

( 19 ) -setrep : Set the number of copies of the files in HDFS

$ hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt

 

 

 

 

Here set the number of copies of the records in just NameNode of metadata, whether there really are so many copies, will have to see DataNode number. Because currently only 3 sets of equipment, at most 3 Ge copies, only the number of nodes increased to 10 Shi Taiwan , copy number to reach 10 .

Three, HDFS client operations

Hadoop first need good preparation in the local computer environment variables before they can

 

 

 

 

Using the idea to create an empty Maven project, has been added at the coordinates

<dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.2</version>
        </dependency>
        
    </dependencies>

本次导入的坐标时间较长,需要耐心等待

需要在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”,在文件中填入

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

创建文件夹↓

@Test
    public void testMkdirs() throws IOException, InterruptedException, URISyntaxException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        // 配置在集群上运行
        // configuration.set("fs.defaultFS", "hdfs://hadoop102:9000");
        // FileSystem fs = FileSystem.get(configuration);

        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "root");

        // 2 创建目录
        fs.mkdirs(new Path("/1108/daxian/banzhang"));

        // 3 关闭资源
        fs.close();
    }

从本地上传文件↓

@Test
    public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {

        // 1 获取文件系统
        Configuration configuration = new Configuration();
        configuration.set("dfs.replication", "2");
        FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "root");

        // 2 上传文件
        fs.copyFromLocalFile(new Path("D:/banzhang.txt"), new Path("/banzhang.txt"));

        // 3 关闭资源
        fs.close();

        System.out.println("over");
    }

 

Guess you like

Origin www.cnblogs.com/g-cl/p/12348681.html