Java API operations in HDFS of Hadoop

1. Preparation

1.1, unzip

Unzip the hadoop installation package to a non-Chinese path (for example: D:\users\hadoop-2.6.0-cdh5.14.2)

1.2, environment variables

Configure the HADOOP_HOME environment variable on windows (
similar to the method of configuring jdk environment variable in windows )

1.3. New construction

Use development tools to create a Maven project

1.4, dependent packages

Import the corresponding dependencies, which are as follows:

<dependencies>
	<dependency>
		<groupId>junit</groupId>
		<artifactId>junit</artifactId>
		<version>RELEASE</version>
	</dependency>
	<dependency>
		<groupId>org.apache.logging.log4j</groupId>
		<artifactId>log4j-core</artifactId>
		<version>2.8.2</version>
	</dependency>
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-common</artifactId>
		<version>2.6.0-cdh5.14.2</version>
	</dependency>
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-client</artifactId>
		<version>2.6.0-cdh5.14.2</version>
	</dependency>
	<dependency>
		<groupId>org.apache.hadoop</groupId>
		<artifactId>hadoop-hdfs</artifactId>
		<version>2.6.0-cdh5.14.2</version>
	</dependency>
</dependencies>

Note: The Maven repository does not support cdh related dependencies. Cloudera has established a related
repository by itself, and the cloudera repository needs to be added separately to the pom.

<repositories>
	 <repository>
		 <id>cloudera</id>
		 <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
	 </repository>
</repositories>

1.5, test

Create a package cn.big.data, create the HdfsClient class, and use the Junit method to test
Create a directory

package cn.big.data;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.junit.Test;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

public class HdfsClient {
    
    
    @Test
    public void testMkdirs() throws IOException, InterruptedException, URISyntaxException {
    
    

        // 1 获取文件系统
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"), conf, "root");

        // 2 创建目录
        fs.mkdirs(new Path("/myApi"));

        // 3 关闭资源
        fs.close();
    }
}

1.6, matters needing attention

If idea cannot print out the log, only the following information will be displayed on the console

1.log4j:WARNNoappenderscouldbefoundforlogger(org.apache.hadoop.
util.Shell).
2.log4j:WARNPleaseinitializethelog4jsystemproperly.
3.log4j:WARNSeehttp://logging.apache.org/log4j/1.2/faq.html#noconfi
gformoreinfo.

You need to create a new file in the src/main/resources directory of the project, name it
"log4j.properties", and fill in the file:

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

2. How to use

2.1, HDFS file upload

@Test
    public void upLoad() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        // 设置副本存储数量为1,默认是3
        configuration.set("dfs.replication","1");
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");
        //上传文件
        fs.copyFromLocalFile(new Path("D:\\study\\codes\\hadoop\\HdfsClientDemo\\data\\hdfsDemo\\test.txt"),new Path("/myApi/"));
        //关闭资源
        fs.close();

        System.out.println("ok");
    }

2.2, HDFS file download

@Test
    public void downLoad() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //下载文件
        // boolean delSrc 指是否将原文件删除
        // Path src 指要下载的文件路径
        // Path dst 指将文件下载到的路径
        // boolean useRawLocalFileSystem 是否开启文件校验
        fs.copyToLocalFile(false,new Path("/myApi/test.txt"),new Path("D:\\study\\codes\\hadoop\\HdfsClientDemo\\HdfsTest"),true);
        fs.close();
    }

2.3, HDFS folder deletion

@Test
    public void dRemove() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //删除文件夹
        fs.delete(new Path("/myApi/remove"),true);
        fs.close();
    }

2.4, HDFS file name change

public void fRename() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //修改文件名
        fs.rename(new Path("/myApi/test.txt"),new Path("/myApi/testRename.txt"));
        fs.close();
    }

2.5, HDFS file details view

@Test
    public void testListFiles() throws IOException, URISyntaxException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //获取文件详情
        RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path("/"),true);
        while (listFiles.hasNext()){
    
    
            LocatedFileStatus status = listFiles.next();
            //输出详情
            //文件名称
            System.out.println(status.getPath().getName());
            //长度
            System.out.println(status.getLen());
            //权限
            System.out.println(status.getPermission());
            //组
            System.out.println(status.getGroup());
            //获取存储的块信息
            BlockLocation[] blockLocations = status.getBlockLocations();
            for (BlockLocation blockLocation : blockLocations) {
    
    
                //获取块存储的主机节点
                String[] hosts = blockLocation.getHosts();
                for (String host : hosts) {
    
    
                    System.out.println(host);
                }
            }
            System.out.println("-------------------------------");
        }
    }

2.6, HDFS file and folder judgment

@Test
    public void testListStatus() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //判断是文件还是文件夹
        FileStatus[] listStatus = fs.listStatus(new Path("/"));
        for (FileStatus fileStatus : listStatus) {
    
    
            if (fileStatus.isFile()){
    
    
                System.out.println("f:"+fileStatus.getPath().getName());
            }else {
    
    
                System.out.println("d:"+fileStatus.getPath().getName());
            }
        }
        fs.close();
    }

2.7, HDFS I/O stream operation

2.7.1 File upload

@Test
    public void putFileToHDFS() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //创建输入流
        FileInputStream fis = new FileInputStream(new File("D:\\study\\codes\\hadoop\\HdfsClientDemo\\HdfsTest\\test.txt"));
        //获取输出流
        FSDataOutputStream fos = fs.create(new Path("/myApi/testIO.txt"));
        //执行流拷贝
        IOUtils.copyBytes(fis,fos,configuration);
        //关闭资源
        IOUtils.closeStream(fis);
        IOUtils.closeStream(fos);
    }

2.7.2 File download

@Test
    public void getFileFromHDFS() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        //获取输入流
        FSDataInputStream fis = fs.open(new Path("/myApi/testIO.txt"));
        //获取输出流
        FileOutputStream fos = new FileOutputStream(new File("D:\\study\\codes\\hadoop\\HdfsClientDemo\\HdfsTest\\IODownload.txt"));
        //流的对拷
        IOUtils.copyBytes(fis,fos,configuration);
        //关闭资源
        IOUtils.closeStream(fis);
        IOUtils.closeStream(fos);
        fs.close();
    }

2.8, positioning file reading

It is emphasized here that you can set any location to read the hdfs file, which is helpful for the understanding of mapreduce slicing inputsplit and spark partition.
First upload the hadoop installation package to the HDFS file system
Download the first piece

@Test
    public void readFileSeek1() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        FSDataInputStream fis = fs.open(new Path("/myApi//hadoop-2.6.0-cdh5.14.2.tar.gz"));
        FileOutputStream fos = new FileOutputStream(new File("C:\\Users\\Dongue\\Desktop\\seek\\hadoop-2.6.0-cdh5.14.2.tar.gz.part1"));
        //流的拷贝
        byte[] buf = new byte[1024];
        for (int i = 0; i < 1024 * 128; i++) {
    
    
            fis.read(buf);
            fos.write(buf);
        }
        IOUtils.closeStream(fis);
        IOUtils.closeStream(fos);
    }

download successful
Insert picture description here
Download the second piece

@Test
    public void readFileSeek2() throws URISyntaxException, IOException, InterruptedException {
    
    
        Configuration configuration = new Configuration();
        FileSystem fs = FileSystem.get(new URI("hdfs://192.168.247.130:9000"),configuration,"root");

        FSDataInputStream fis = fs.open(new Path("/myApi//hadoop-2.6.0-cdh5.14.2.tar.gz"));
        //定位输入数据位置
        fis.seek(1024*1024*128);
        FileOutputStream fos = new FileOutputStream(new File("C:\\Users\\Dongue\\Desktop\\seek\\hadoop-2.6.0-cdh5.14.2.tar.gz.part2"));
        //流的对拷
        IOUtils.copyBytes(fis,fos,configuration);

        IOUtils.closeStream(fis);
        IOUtils.closeStream(fos);
    }

Merge files
Execute in the window command window

type hadoop-2.6.0-cdh5.14.2.tar.gz.part2 >> hadoop-2.6.0-cdh5.14.2.tar.gz.part1

After the merger is the complete hadoop installation package file
Insert picture description here

Guess you like

Origin blog.csdn.net/weixin_48482704/article/details/111089319