Record the whole process of Java reading files on hdfs


foreword

        Follow Baige to learn Java, and today I will share how Java uploads files to hdfs.


提示:以下是一点见解

1. General process of the project

        If we want to upload to hdfs, we must first know what hdfs is:

        Essence: The Chinese translation of HDFS is Hadoop Distributed File System (Hadoop Distributed File System). It is still a program in essence, mainlymanages files in a tree-like directory structure (similar to linux, / indicates the root path), and can run on multiple nodes (that is, distributed).

        The problem to be solved: store massive offline data (such as TB, PB, ZB level data), and ensure high data availability and support high concurrent access. Note: It is not suitable for storing a large number of small files in HDFS. (Main reason: The NameNode process of HDFS stores the metadata of files in memory, so the more files there are, the more memory it consumes. A large number of small files exhaust the memory of the NameNode node, but the total number of files actually stored is very small. Small, the advantage of HDFS storing massive data has not been brought into play)

        Architecture: The architecture of HDFS is as follows, and the detailed deployment on the Linux side will not be repeated one by one.

 Our project consists of the following:

2. Detailed steps

1. Create an empty project in the idea (even Xiaobai can understand it)

Diagram (example):

 

 Note: Generally, idea will come with a jdk of 20 or above. If the version is not satisfactory, you can also click download and choose the appropriate jdk

 

 

 

 

 Ok, so now we get an empty project

2. Import the required jar package

        After creating a new project, import HDFS-related jar packages into the project to call related classes and methods provided by HDFS. We now import the required jar dependencies, and then import the package:

2. After entering the code, it can be realized

 public static void main(String[] args) throws IOException {
        if(args.length == 0){
            System.out.println("创建失败,请传入一个路径参数指定要读取的文件");
            return;
        }
        String feilePath = args[0];
        System.out.println("传入的名字是:"+feilePath);

        //1.创建词汇表Configuration类型的对象
        Configuration conf =  new Configuration();
        conf.set("fs.defaultFS","hdfs://20210322045-master:9000");

        FileSystem fs = FileSystem.get(conf);
        FSDataInputStream fsDataInputStream = fs.open(new Path(feilePath));
        BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fsDataInputStream));

        //读取文件第一行字符串,如果返回null,表明已经读取到文件的末尾
        String nextLine = bufferedReader.readLine();

        //如果没有读取到末尾,则继续读取
        while (null != nextLine){
            //将读取到的数据输出到控制台
            System.out.println(nextLine);
            nextLine = bufferedReader.readLine();
        }
        fs.close();
    }

Diagram (example):

The data requested by the url network used here.


Summarize

Tip: Here is a summary of the article:
For example: the above is what I will talk about today. This article only briefly introduces the use of pandas, and pandas provides a large number of functions and methods that allow us to process data quickly and easily.

Guess you like

Origin blog.csdn.net/qq_51294997/article/details/131038660