Big Data (3) --- HDFS client commands and java connection

A parameter setting

  Speaking before there's HDFS backup number and size of cuts are configurable, default is the backup 3 , cut the size of the default 128M

  Cut the number of copies of file size and storage, it is determined by the client!

  So-called decision by the client, the client machine via the above configuration parameter set

  hdfs client will read the following two parameters to determine the cut size, number of copies:

  Cut the size of the parameters: dfs.blocksize

  Copies of number of parameters: dfs.replication

  More data see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

  So we only need the client machine above hdfs-site.xml be configured in:

<property>

<name>dfs.blocksize</name>

<value>64m</value>

</property>

 

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

 

  We uploaded two clients, one client to modify the above configuration, view uploaded file information

  You can see a file is 3 and 128M , the other one is 2 and 64m

 

 

 

Second, the client command-line operation

 

 

1, upload files to hdfs in

hadoop fs -put / Local File   / aaa

 

2. Download the file to a local disk client

hadoop fs -get / hdfs path   / local disk directory

 

3, in hdfs create folder

hadoop fs -mkdir  -p /aaa/xxx

 

4, mobile hdfs files (renamed)

hadoop fs -mv / hdfs path 1 / hdfs other path 2

 

Copy hdfs files to hdfs another directory

hadoop fs -cp / hdfs path _1 / hdfs path _2

 

5, delete hdfs a file or folder

hadoop fs -rm -r /aaa

 

6, see hdfs text file content

hadoop fs -cat /demo.txt

hadoop fs -tail -f /demo.txt

 

More command: https://www.cnblogs.com/houkai/p/3848089.html

 

  Three, java connection

 

1. first need to set up a local development environment, because when the local application will start from hadoop go back inside to call c function operation of the local file system, so we need to configure the local hadoop environmental information.

The hadoop Unzip the package out, leaving the directory where the script on it, some other directory can lose, you can leave on the next chart circle

 

 Configuration hadoop environment variables, bin replacing asked inside the directory file windows script file.

 

 

 

windows script file where to get it, can go to compile, you can also find others compiled:

https://github.com/steveloughran/winutils

 

 

 This is someone already compiled windows script, change to the bin directory of your own herd.

After configuring the check instruction can identify hadoop

 

 

 

 

 

2. ready can lead pack line and code

Lead package: the best version of yourself and install the same version of hadoop

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>${hadoop.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>

 

上代码:

package com.nijunyang.hadoop.hdfs;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.junit.Before;
import org.junit.Test;

import java.net.URI;
import java.util.Arrays;

/**
 * Description:
 * Created by nijunyang on 2019/12/25 20:26
 */
public class HDFSDemo {

    FileSystem fs;
    
    @Before
    public void init() throws Exception{

        URI uri = new URI("hdfs://nijunyang68:9000/");
        /**
         * Configuration 构造会从 classpath中加载core-default.xml hdfs-default.xml core-site.xml hdfs-site.xml等文件
         * 也可使用set方法进行自己设置值
         * https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
         */
        Configuration conf = new Configuration();
        conf.set("dfs.replication", "2");
        // 切块的规格大小:32M
        conf.set("dfs.blocksize", "32m");
        fs = FileSystem.get(uri, conf, "root");
    }

    @Test
    public void test1() throws Exception {
        // 上传一个文件到HDFS中
        fs.copyFromLocalFile(new Path("E:/安装包/linux/jdk-8u191-linux-x64.tar.gz"), new Path("/soft/"));
        //下载到本地
        fs.copyToLocalFile(new Path("/soft/jdk-8u191-linux-x64.tar.gz"), new Path("f:/"));
        //在hdfs内部移动文件/修改名称
        fs.rename(new Path("/redis-5.0.5.tar.gz"), new Path("/redis5.0.5.tar.gz"));
        //在hdfs中创建文件夹
        fs.mkdirs(new Path("/xx/yy/zz"));
        //在hdfs中删除文件或文件夹
        fs.delete(new Path("/xx/yy/zz"), true);
        //查询hdfs指定目录下的文件信息
        RemoteIterator<LocatedFileStatus> iter = fs.listFiles(new Path("/"), true);
        while(iter.hasNext()){
            LocatedFileStatus status = iter.next();
            System.out.println("文件全路径:"+status.getPath());
            System.out.println("块大小:"+status.getBlockSize());
            System.out.println("文件长度:"+status.getLen());
            System.out.println("副本数量:"+status.getReplication());
            System.out.println("块信息:"+ Arrays.toString(status.getBlockLocations()));
            System.out.println("--------------------------------");
        }
        //查询hdfs指定目录下的文件和文件夹信息
        FileStatus[] listStatus = fs.listStatus(new Path("/"));
        for(FileStatus status:listStatus){
            System.out.println("文件全路径:"+status.getPath());
            System.out.println(status.isDirectory()?"这是文件夹":"这是文件");
            System.out.println("块大小:"+status.getBlockSize());
            System.out.println("文件长度:"+status.getLen());
            System.out.println("副本数量:"+status.getReplication());
            System.out.println("--------------------------------");
        }
        fs.close();
    }
}

简单来说java代码也就是一个客户端访问,所以说配置信息都可以塞到Configuration里面去。

Guess you like

Origin www.cnblogs.com/nijunyang/p/12099233.html