Hadoop learning of big data technology (3)

Table of contents

Java API operation of HDFS

1 Introduction

2. Case - using Java API to operate HDFS

(1) Build the project environment

(2) Initialize the client object and upload files

 (3) Download files from HDFS to local

(4) Directory operation

(5) Check the file information in the directory


Java API operation of HDFS

1 Introduction

        Hadoop is written in the Java language, so you can use the Java API to operate the Hadoop file system. HDFS Shell is essentially the application of the Java API. To operate HDFS through programming, the core is to use the Java API provided by HDFS to construct an access client object. Operate files on HDFS through client objects.

        Hadoop integrates many file systems, and HDFS is just an example of a file system. Here is the official document of Hadoop for readers to consult and learn by themselves.

https://hadoop.apache.org/docs/stable/api/index.html

2. Case - using Java API to operate HDFS

        This case mainly demonstrates how to operate the HDFS file system, including uploading files, downloading files, etc.

(1) Build the project environment

        Open IDEA to create a simple Maven project, as shown below.

 After creating the Maven project, there is a pom.xml configuration file in the directory structure, which is the core file for project management. Here we configure it and add related dependencies, the code is as follows.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>cn.itcast</groupId>
    <artifactId>HadoopDemo</artifactId>
    <version>1.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.10.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.10.1</version>
        </dependency>

        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>3.7.1</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.13.2</version>
        </dependency>

    </dependencies>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

</project>

It should be noted here that the author uses hadoop version 2.10.1 and zookeeper version 3.7.1 in the above code, which should be modified according to his own version, and then if the code is marked red after copying, you need to wait for a while, and the idea will be downloaded automatically.

As shown above, here are some configurations for the maven project, so that it can be downloaded automatically.

(2) Initialize the client object and upload files

        Create the cn.itcast.hdfsdemo package under the test package under the src package of the project, and then create a HDFS_uploading java file in this package. In order to facilitate viewing, the author directly creates a file textHadoop to store the results, and the relevant code is as shown below.

package cn.itcast.hdfsdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.IOException;

public class HDFS_uploading {
    FileSystem fs = null;
    public void init() throws Exception {
        //构建配置参数对象:Configuration
        Configuration conf = new Configuration();
        //设置参数,指定要访问的文件系统的类型:HDFS文件系统
        conf.set("fs.defaultFS","hdfs://hadoop01.bgd01:9000");
        //设置客户端的访问身份,以root身份访问HDFS
        System.setProperty("HADOOP_USER_NAME","root");
        //通过FileSystem类的静态方法,获取文件系统客户端对象
        fs = FileSystem.get(conf);
    }
    //将本地文件上传到HDFS
    public void testAddFileToHdfs() throws IOException {
        //要上传的文件所在本地路径
        Path src = new Path("/home/huanganchi/Hadoop/实训项目/HadoopDemo/textHadoop/HdfsDemo/\n" +
                "input/text");
        //要上传到HDFS的目标路径
        Path dst = new Path("/");
        //上传文件
        fs.copyFromLocalFile(src,dst);
        //关闭资源
        fs.close();
    }
}

 

 

 (3) Download files from HDFS to local

        Under the cn.itcast.hdfsdemo package, create the java file of HDFS_download, the code is as shown in the figure below.

package cn.itcast.hdfsdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class HDFS_download {
    FileSystem fs = null;
    @Before
    public void init() throws Exception {
        //构建配置参数对象:Configuration
        Configuration conf = new Configuration();
        //设置参数,指定要访问的文件系统的类型:HDFS文件系统
        conf.set("fs.defaultFS","hdfs://hadoop01.bgd01:9000");
        //设置客户端的访问身份,以root身份访问HDFS
        System.setProperty("HADOOP_USER_NAME","root");
        //通过FileSystem类的静态方法,获取文件系统客户端对象
        fs = FileSystem.get(conf);
    }
    //从HDFS下载文件到本地
    @Test
    public void testDownLoadFileToLocal() throws IOException {
        //下载文件
        fs.copyToLocalFile(new Path("/helloword.txt"), new Path("/home/huanganchi/Hadoop/实训项目/HadoopDemo/textHadoop/HdfsDemo/output"));
        //关闭资源
        fs.close();
    }
}

        It should be noted here that the author uses a linux system, so the path format of uploading and downloading files is different from that of windows. The path format of windows is the format of "disk://file//file".

 

(4) Directory operation

        Under the cn.itcast.hdfsdemo package, create the java file of HDFS_operate, the code is as shown in the figure below.

package cn.itcast.hdfsdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class HDFS_operate {
    FileSystem fs = null;
    @Before
    public void init() throws Exception {
        //构建配置参数对象:Configuration
        Configuration conf = new Configuration();
        //设置参数,指定要访问的文件系统的类型:HDFS文件系统
        conf.set("fs.defaultFS","hdfs://hadoop01.bgd01:9000");
        //设置客户端的访问身份,以root身份访问HDFS
        System.setProperty("HADOOP_USER_NAME","root");
        //通过FileSystem类的静态方法,获取文件系统客户端对象
        fs = FileSystem.get(conf);
    }
    //在HDFS上创建、删除、重命名文件
    @Test
    public void testMkdirAndDeleteAndRename() throws IOException {
        //创建目录
        fs.mkdirs(new Path("/a/b/c"));
        fs.mkdirs(new Path("/a2/b2/c2"));
        //重命名文件或文件夹
        fs.rename(new Path("/a"), new Path("/a3"));
        //删除文件夹,如果是非空文件夹。参数2必须给值true
        fs.delete(new Path("/a2"), true);
        //关闭资源
        fs.close();
    }
}

Create a directory

double naming

 

delete

 

(5) Check the file information in the directory

 Under the cn.itcast.hdfsdemo package, create the java file of HDFS_check, the code is as shown in the figure below.

package cn.itcast.hdfsdemo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.Before;
import org.junit.Test;

import java.io.IOException;

public class HDFS_check {
    FileSystem fs = null;
    @Before
    public void init() throws Exception {
        //构建配置参数对象:Configuration
        Configuration conf = new Configuration();
        //设置参数,指定要访问的文件系统的类型:HDFS文件系统
        conf.set("fs.defaultFS","hdfs://hadoop01.bgd01:9000");
        //设置客户端的访问身份,以root身份访问HDFS
        System.setProperty("HADOOP_USER_NAME","root");
        //通过FileSystem类的静态方法,获取文件系统客户端对象
        fs = FileSystem.get(conf);
    }
    //查看目录信息,只显示文件
    @Test
    public void testListFiles() throws IOException {
        //获取迭代器对象
        //RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"), true);
        RemoteIterator<LocatedFileStatus> liFiles = fs.listFiles(new Path("/helloword.txt"), true);

        //遍历迭代器
        while (liFiles.hasNext()) {
            LocatedFileStatus fileStatus = liFiles.next();

            //打印当前文件名
            System.out.println(fileStatus.getPath().getName());
            打印当前文件块大小
            System.out.println(fileStatus.getBlockSize());
            //打印当前文件的权限
            System.out.println(fileStatus.getPermission());
            //打印当前文件内容的长度
            System.out.println(fileStatus.getLen());
            //获取文件块信息(块长度、块的datanode信息
            BlockLocation[] blockLocations = fileStatus.getBlockLocations();
            for (BlockLocation bl : blockLocations) {
                System.out.println("blick-length:" + bl.getLength() + "--" + "block-offset:" + bl.getOffset());
                String[] hosts = bl.getHosts();
                for (String host : hosts) {
                    System.out.println(host);
                }
            }
            System.out.println("-------------分割线--------------");
        }
    }
}

 


Reference books:

"Hadoop Big Data Technology Principles and Applications" P62-P63

 

 

 

おすすめ

転載: blog.csdn.net/weixin_63507910/article/details/128524812