[Big Data Development Technology] Experiment 05-Creation, deletion and query operations of HDFS directories and files

Create, delete and query HDFS directories and files

1. Experimental goals

  1. Proficient in hadoop operating instructions and HDFS command line interface
  2. Master how to create HDFS directories and files and how to write files to HDFS files
  3. Master how to delete HDFS directories and files
  4. Learn how to query file status information and metadata information of all files in a directory

2. Experimental requirements

  1. Screenshots of the successful results of the main experimental steps are given.
  2. It is required to test locally and in a cluster respectively, and provide screenshots of the test results.
  3. Provide a comprehensive summary of this experimental work.
  4. After completing the experiment content, add the student number and name to the experiment report file name.

3. Experimental content

  1. Create a directory and write a local file to the directory. For the effect, please refer to the figure below:
    1

  2. Delete files and directories. To achieve the effect, please refer to the figure below:
    2

  3. Query the file status information and metadata information of all files in the directory. For the effect, please refer to the figure below:
    3

4. Experimental steps

  1. Create a directory and write a local file to the directory

programming

package com.wjw.cslg;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class WJW01 {
    
    

    public static void main(String[] args) {
    
    
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        FileSystem fs = null;
        args = new String[2];
        args[0] = "hdfs://master:9000/wjw02.txt";
        args[1] = "hdfs://master:9000/wjw02";
        try{
    
    
            for(int i=0; i<args.length; i++){
    
    
                fs = FileSystem.get(URI.create(args[i]), conf);
                fs.mkdirs(new Path(args[i]));
            }
        }catch (IOException e){
    
    
            e.printStackTrace();
        }
    }

}

program analysis

This program is a Java program that uses Hadoop's API. Its main function is to create a directory with a specified path on HDFS.

First, the program uses the Configuration class to create a configuration object conf, which is used to specify Hadoop configuration information. Then use the FileSystem class to create a file system object fs for interacting with HDFS. The args array represents the parameters passed in by the user on the command line, where args[0] represents the path to be created, and args[1] represents the directory name to be created.

Next, the program enters the for loop statement and traverses all paths in the args array. In the loop body, the program calls the get() method of FileSystem to obtain a file system object. The parameters of this method are a URI object and a configuration object conf. URI objects represent paths on HDFS and can be created through the URI.create() method. After creating the file system object, the program calls the mkdirs() method to create the specified directory.

Finally, the program catches possible IOException exceptions and prints out error information.

Overall, this program is relatively simple. It mainly requires you to be familiar with the use of Hadoop API and understand the basic principles of creating HDFS directories.

operation result

4

  1. Delete files and directories

programming

package com.wjw.cslg;

import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

public class WJW02 {
    
    

    public static void main(String[] args) {
    
    
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        FileSystem fs = null;
        args = new String[2];
        args[0] = "hdfs://master:9000/wjw02.txt";
        args[1] = "hdfs://master:9000/wjw02";
        try{
    
    
            for(int i=0; i<args.length; i++){
    
    
                fs = FileSystem.get(URI.create(args[i]), conf);
                fs.delete(new Path(args[i]));
            }
        }catch (IOException e){
    
    
            e.printStackTrace();
        }
    }

}

program analysis

This program is a Java program that uses Hadoop's API. Its main function is to delete files or directories with specified paths on HDFS.

First, the program uses the Configuration class to create a configuration object conf, which is used to specify Hadoop configuration information. Then use the FileSystem class to create a file system object fs for interacting with HDFS. The args array represents the parameters passed in by the user on the command line, where args[0] represents the path to be deleted, and args[1] represents the directory name to be deleted.

Next, the program enters the for loop statement and traverses all paths in the args array. In the loop body, the program calls the get() method of FileSystem to obtain a file system object. The parameters of this method are a URI object and a configuration object conf. URI objects represent paths on HDFS and can be created through the URI.create() method. After creating the file system object, the program calls the delete() method to delete the specified file or directory.

Finally, the program catches possible IOException exceptions and prints out error information.

Generally speaking, this program is relatively simple. It mainly requires familiarity with the use of Hadoop API and understanding the basic principles of deleting HDFS files or directories. It should be noted that when deleting a file or directory, you need to ensure that the target exists and is not locked by other programs or users, otherwise the deletion will fail.

operation result

5

  1. Query file status information and metadata information of all files in the directory

programming

package com.wjw.cslg;
import java.io.*;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.*;
import org.apache.hadoop.conf.*;
import java.net.*;
public class WJW03 {
    
    
 
public static void main(String[] args) {
    
    
    // TODO Auto-generated method stub
	Configuration conf=new Configuration();
	args=new String[1];
    args[0]="hdfs://master:9000/wjw01.txt";
    conf.set("fs.DefailtFS","hdfs://master:9000/");
    FileSystem fs=null;
    try{
    
    
	    fs=FileSystem.get(URI.create(args[0]),conf);
	    FileStatus filestatus[]=fs.listStatus(new Path(args[0]));
	    for(int i=0;i<filestatus.length;i++){
    
    
	        System.out.println(filestatus[i]);
        }
    }catch(IOException e){
    
    
        e.printStackTrace();
	}
}

program analysis

This program is a Java program that uses Hadoop's API. Its main function is to obtain all files or directories under the specified path on HDFS.

First, the program uses the Configuration class to create a configuration object conf, which is used to specify Hadoop configuration information. Next, the program uses the URI.create() method to create a URI object and passes it as a parameter to the FileSystem.get() method, which returns a FileSystem object for interacting with HDFS. The args array represents the parameters passed in by the user on the command line, where args[0] represents the path to be obtained.

Next, the program calls the listStatus() method of FileSystem to obtain information about all files or directories under the specified path, and stores the results in a FileStatus array. Finally, the program iterates through the array and outputs information about each file or directory to the console.

It should be noted that when the program creates the configuration object conf, it uses the set() method to set the fs.DefaultFS attribute, which is used to specify the default file system address of the Hadoop cluster, that is, "fs.defaultFS" instead of "fs.DefailtFS" "(Pay attention to the correct spelling of the word).

Generally speaking, this program is relatively simple and is mainly used to familiarize yourself with the use of Hadoop API and understand the basic principles of obtaining file or directory information under the HDFS path. It should be noted that the listStatus() method only returns information about direct subfiles or directories under the specified path, and does not recursively return information about all subfiles or directories. If you want to obtain information about all sub-files or directories, you need to use a recursive algorithm.

operation result

6

Attachment: series of articles

experiment Article directory direct link
Experiment 01 Hadoop installation and deployment https://want595.blog.csdn.net/article/details/132767284
Experiment 02 HDFS common shell commands https://want595.blog.csdn.net/article/details/132863345
Experiment 03 Hadoop reads files https://want595.blog.csdn.net/article/details/132912077
Experiment 04 HDFS file creation and writing https://want595.blog.csdn.net/article/details/133168180
Experiment 05 Create, delete and query HDFS directories and files https://want595.blog.csdn.net/article/details/133168734

おすすめ

転載: blog.csdn.net/m0_68111267/article/details/133168734