Cloud Computing and Big Data Introductory Experiment 2 - Familiar with commonly used HDFS (Hadoop) operations

Cloud Computing and Big Data Introductory Experiment 2 - Familiar with commonly used HDFS (Hadoop) operations

Purpose

Understand the role of HDFS in the Hadoop architecture

Familiar with commonly used Shell commands for HDFS operations

Familiar with Java APIs commonly used in HDFS operations

experiment platform

Operating system: Linux (Ubuntu16.04 is recommended)

Hadoop version: 2.10.2

JDK version: 1.7 or above

Java IDE: IDEA

Experimental procedure

  1. Program to achieve the following functions, and use the Shell command provided by Hadoop to complete the same task
  • Upload any text file to HDFS. If the specified file already exists in HDFS, it is up to the user to specify whether to append to the end of the original file or overwrite the original file.

  • Download the specified file from HDFS, if the name of the local file is the same as the file to be downloaded, automatically rename the downloaded file

  • Output the content of the specified file in HDFS to the terminal

  • Display information such as read and write permissions, size, creation time, and path of the specified file in HDFS

  • Given a certain directory in HDFS, output information such as read and write permissions, size, creation time, and path of all files in the directory. If the file is a directory, recursively output information about all files in the directory

  • Provide a path to a file in HDFS to create and delete the file. If the directory where the file is located does not exist, the directory is automatically created

  • Provide the path of an HDFS directory to create and delete the directory. When creating a directory, if the directory where the directory file is located does not exist, the corresponding directory will be created automatically; when deleting the directory, the user specifies whether to delete the directory when the directory is not empty

  • Add content to the file specified in HDFS, and append the content specified by the user to the beginning or end of the original file

  • Delete the specified file in HDFS

  • In HDFS, move files from source path to destination path

  1. Program to implement a class "MyFSDataInputStream", which inherits "org.apache.hadoop.fs.FSDataInputStream", the requirements are as follows: implement the method "readLine()" to read the specified file in HDFS line by line, if the end of the file is read, then Returns empty, otherwise returns the text of a line in the file

  2. 查看Java帮助手册或其它资料,用“java.net.URL”和“org.apache.hadoop.fs.FsURLStreamHandlerFactory”编程完成输出HDFS中指定文件的文本到终端中

实验内容

编程实现以下功能,并利用Hadoop提供的Shell命令完成相同任务

向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,则由用户来指定是追加到原有文件末尾还是覆盖原有的文件

# 检查文件是否存在
hadoop fs -test -e text.txt
# 执行完上述命令不会输出结果,需要继续输入命令查看结果,这里结果为 0 就表示已经存在
echo $?
# 查看文件位置,这里选择匹配字符串
hdfs dfs -ls -R / | grep [text.txt]

文件不存在:

# 直接上传文件
hadoop fs -put ~/hdfs/text.txt /user/hadoop/text.txt
# 检索文件是否存在
hadoop fs -ls /user/hadoop
# 查看文件内容
hadoop fs -cat /user/hadoop/text.txt

文件已经存在:

# 方法1 通过命令执行
# 将文件内容追加到源文件末尾
hadoop fs -appendToFile ~/hdfs/text.txt /user/hadoop/text.txt
hadoop fs -cat /user/hadoop/text.txt
# 覆盖源文件
hadoop fs -copyFromLocal -f ~/hdfs/text.txt /user/hadoop/text.txt
hadoop fs -cat /user/hadoop/text.txt
# 方法2 通过 shell 脚本语言执行
if $(hadoop fs -test -e /user/hadoop/text.txt);
then $(hadoop fs -appendToFile ~/hdfs/text.txt /user/hadoop/text.txt);
echo $(hadoop fs -cat /user/hadoop/text.txt);
else $(hadoop fs -copyFromLocal -f ~/hdfs/text.txt /user/hadoop/text.txt);
echo $(hadoop fs -cat /user/hadoop/text.txt);
fi

完整代码图下:

Hadoop.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;

import java.io.*;

public class Hadoop {
    /**
     * 判断路径是否存在
     */
    public static boolean test(Configuration conf, String path) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        return fs.exists(new Path(path));
    }

    /**
     * 复制文件到指定路径
     * 若路径已存在,则进行覆盖
     */
    public static void copyFromLocalFile(Configuration conf, String localFilePath, String
            remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path localPath = new Path(localFilePath);
        Path remotePath = new Path(remoteFilePath);
/* fs.copyFromLocalFile 第一个参数表示是否删除源文件,第二个参数表示是否覆
盖 */
        fs.copyFromLocalFile(false, true, localPath, remotePath);
        fs.close();
    }

    /**
     * 追加文件内容
     */
    public static void appendToFile(Configuration conf, String localFilePath, String
            remoteFilePath) throws IOException {
        FileSystem fs = FileSystem.get(conf);
        Path remotePath = new Path(remoteFilePath);
        /* 创建一个文件读入流 */
        FileInputStream in = new FileInputStream(localFilePath);
        /* 创建一个文件输出流,输出的内容将追加到文件末尾 */
        FSDataOutputStream out = fs.append(remotePath);
        /* 读写文件内容 */
        byte[] data = new byte[1024];
        int read = -1;
        while ((read = in.read(data)) > 0) {
            out.write(data, 0, read);
        }
        out.close();
        in.close();
        fs.close();
    }
}
Main.java
import org.apache.hadoop.conf.Configuration;

public class Main {
    public static void main(String[] args) {
        Configuration conf = new Configuration();
        conf.set("fs.default.name","hdfs://localhost:8088");
        String localFilePath = "/home/ppqppl/hdfs/test.txt"; // 本地路径
        String remoteFilePath = "/user/hadoop/test.txt"; // HDFS 路径
        String choice = "append"; // 若文件存在则追加到文件末尾
// String choice = "overwrite"; // 若文件存在则覆盖
        try {
            /* 判断文件是否存在 */
            Boolean fileExists = false;
            if (Hadoop.test(conf, remoteFilePath)) {
                fileExists = true;
                System.out.println(remoteFilePath + " 已存在.");
            } else {
                System.out.println(remoteFilePath + " 不存在.");
            }
            /* 进行处理 */
            if ( !fileExists) { // 文件不存在,则上传
                Hadoop.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已上传至 " + remoteFilePath);
            } else if ( choice.equals("overwrite") ) { // 选择覆盖
                Hadoop.copyFromLocalFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已覆盖 " + remoteFilePath);
            } else if ( choice.equals("append") ) { // 选择追加
                Hadoop.appendToFile(conf, localFilePath, remoteFilePath);
                System.out.println(localFilePath + " 已追加至 " + remoteFilePath);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

代码运行结果:

Guess you like

Origin blog.csdn.net/m0_59161987/article/details/129920828
Recommended