云计算与大数据入门实验二 —— 熟悉常用的 HDFS(Hadoop) 操作

实验目的

理解HDFS在Hadoop体系结构中的角色

熟练使用HDFS操作常用的Shell命令

熟悉HDFS操作常用的Java API

实验平台

操作系统：Linux(建议Ubuntu16.04)

Hadoop版本：2.10.2

JDK版本：1.7或以上版本

Java IDE：IDEA

实验步骤

编程实现以下功能，并利用Hadoop提供的Shell命令完成相同任务

向HDFS中上传任意文本文件，如果指定的文件在HDFS中已经存在，则由用户来指定是追加到原有文件末尾还是覆盖原有的文件
从HDFS中下载指定文件，如果本地文件与要下载的文件名称相同，则自动对下载的文件重命名
将HDFS中指定文件的内容输出到终端中
显示HDFS中指定的文件的读写权限、大小、创建时间、路径等信息
给定HDFS中某一个目录，输出该目录下的所有文件的读写权限、大小、创建时间、路径等信息，如果该文件是目录，则递归输出该目录下所有文件相关信息
提供一个HDFS内的文件的路径，对该文件进行创建和删除操作。如果文件所在目录不存在，则自动创建目录
提供一个HDFS的目录的路径，对该目录进行创建和删除操作。创建目录时，如果目录文件所在目录不存在，则自动创建相应目录；删除目录时，由用户指定当该目录不为空时是否还删除该目录
向HDFS中指定的文件追加内容，由用户指定内容追加到原有文件的开头或结尾
删除HDFS中指定的文件
在HDFS中，将文件从源路径移动到目的路径

编程实现一个类“MyFSDataInputStream”，该类继承“org.apache.hadoop.fs.FSDataInputStream”，要求如下：实现按行读取HDFS中指定文件的方法“readLine()”，如果读到文件末尾，则返回空，否则返回文件一行的文本
查看Java帮助手册或其它资料，用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中

实验内容

编程实现以下功能，并利用Hadoop提供的Shell命令完成相同任务

向HDFS中上传任意文本文件，如果指定的文件在HDFS中已经存在，则由用户来指定是追加到原有文件末尾还是覆盖原有的文件

# 检查文件是否存在
hadoop fs -test -e text.txt
# 执行完上述命令不会输出结果，需要继续输入命令查看结果，这里结果为 0 就表示已经存在
echo $?
# 查看文件位置，这里选择匹配字符串
hdfs dfs -ls -R / | grep [text.txt]

文件不存在：

# 直接上传文件
hadoop fs -put ~/hdfs/text.txt /user/hadoop/text.txt
# 检索文件是否存在
hadoop fs -ls /user/hadoop
# 查看文件内容
hadoop fs -cat /user/hadoop/text.txt

文件已经存在：

# 方法1 通过命令执行
# 将文件内容追加到源文件末尾
hadoop fs -appendToFile ~/hdfs/text.txt /user/hadoop/text.txt
hadoop fs -cat /user/hadoop/text.txt
# 覆盖源文件
hadoop fs -copyFromLocal -f ~/hdfs/text.txt /user/hadoop/text.txt
hadoop fs -cat /user/hadoop/text.txt
# 方法2 通过 shell 脚本语言执行
if $(hadoop fs -test -e /user/hadoop/text.txt);
then $(hadoop fs -appendToFile ~/hdfs/text.txt /user/hadoop/text.txt);
echo $(hadoop fs -cat /user/hadoop/text.txt);
else $(hadoop fs -copyFromLocal -f ~/hdfs/text.txt /user/hadoop/text.txt);
echo $(hadoop fs -cat /user/hadoop/text.txt);
fi

完整代码图下：

Hadoop.java

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import java.io.*;

public class Hadoop {
    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 复制文件到指定路径
     * 若路径已存在，则进行覆盖
     */
    public static void copyFromLocalFile(Configuration conf, String localFilePath, String
            remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path localPath = new Path(localFilePath);
        Path remotePath = new Path(remoteFilePath);
/* fs.copyFromLocalFile 第一个参数表示是否删除源文件，第二个参数表示是否覆
盖 */
        fs.copyFromLocalFile(false, true, localPath, remotePath);
        fs.close();
    }

    /**
     * 追加文件内容
     */
    public static void appendToFile(Configuration conf, String localFilePath, String
            remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        /* 创建一个文件读入流 */
        FileInputStream in = new FileInputStream(localFilePath);
        /* 创建一个文件输出流，输出的内容将追加到文件末尾 */
        FSDataOutputStream out = fs.append(remotePath);
        /* 读写文件内容 */
        byte[] data = new byte[1024];
        int read = -1;
        while ((read = in.read(data)) > 0) {
            out.write(data, 0, read);
        }
        out.close();
        in.close();
        fs.close();
    }
}

Main.java

import org.apache.hadoop.conf.Configuration;

public class Main {
    public static void main(String[] args) {
        Configuration conf = new Configuration();
        conf.set("fs.default.name","hdfs://localhost:8088");
        String localFilePath = "/home/ppqppl/hdfs/test.txt"; // 本地路径
        String remoteFilePath = "/user/hadoop/test.txt"; // HDFS 路径
        String choice = "append"; // 若文件存在则追加到文件末尾
// String choice = "overwrite"; // 若文件存在则覆盖
        try {
            /* 判断文件是否存在 */
            Boolean fileExists = false;
            if (Hadoop.test(conf, remoteFilePath)) {
                fileExists = true;
                System.out.println(remoteFilePath + " 已存在.");
            } else {
                System.out.println(remoteFilePath + " 不存在.");
            }
            /* 进行处理 */
            if ( !fileExists) { // 文件不存在，则上传
                Hadoop.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
            } else if ( choice.equals("overwrite") ) { // 选择覆盖
                Hadoop.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
            } else if ( choice.equals("append") ) { // 选择追加
                Hadoop.appendToFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

代码运行结果：

云计算与大数据入门实验二 —— 熟悉常用的 HDFS（Hadoop）操作

云计算与大数据入门实验二 —— 熟悉常用的 HDFS(Hadoop) 操作

实验目的

实验平台

实验步骤

实验内容

猜你喜欢

云计算与大数据入门实验二 —— 熟悉常用的 HDFS（Hadoop） 操作

云计算与大数据入门实验二 —— 熟悉常用的 HDFS(Hadoop) 操作

实验目的

实验平台

实验步骤

实验内容

猜你喜欢

云计算与大数据入门实验二 —— 熟悉常用的 HDFS（Hadoop）操作