Hadoop之FileSystem使用

前言：

在对hadoop的HDFS文件系统进行操作的时候，我们使用HADOOP_HOME/bin/hdfs dfs [command]，该command指的就是相应的文件操作，这是使用shell的方式。

同样，hadoop也提供了使用java来操作HDFS文件的方法

本次，我们就来简单看下如何使用java来操作HDFS

准备工作：

* 创建一个maven项目，命名hadoop

* 引入hadoop操作的相关包，hadoop-client（读者可以在https://mvnrepository.com/ 中找到对应的版本，笔者使用的是hadoop-2.7.0，故引入该版本的包）

  <properties>
  	<hadoop.version>2.7.0</hadoop.version>
  </properties>

  <dependencies>
	<dependency>
	    <groupId>org.apache.hadoop</groupId>
	    <artifactId>hadoop-client</artifactId>
	    <version>${hadoop.version}</version>
	</dependency>
  </dependencies>

1.获取FileSystem

FileSystem是一个抽象类，定义了hadoop的一个文件系统接口，里面基本包含了对HDFS文件操作的所有API

下面是获取FileSystem的代码：

public class HDFSDemo {

	FileSystem fileSystem;
	String hdfsUri = "hdfs://hadoop:9000";// 对应于core-site.xml中的FS.default
	
	/**
	 * 获取FileSystem
	 * FileSystem是HDFS的一个抽象，用于操作HDFS
	 * 里面的操作等同于HADOOP_HOME/bin/hdfs dfs里的操作
	 * @throws URISyntaxException 
	 * @throws IOException 
	 * @throws InterruptedException 
	 */
	public void getFileSystem() throws IOException, URISyntaxException, InterruptedException{
		Configuration configuration = new Configuration();
		// 可通过指定的方式来获取，也可通过加载core-site.xml来获取
		// hxw用户为超级用户
		fileSystem = FileSystem.get(new URI(hdfsUri),configuration,"hxw");
	}
}

注意：由于笔者的hadoop超级管理员是hxw，所以直接使用该用户获取，以便获取最高权限。否则可能会在以下的操作中由于权限不够导致操作失败

2.FileSystem的常规操作

	/**
	 * 创建文件夹
	 * @throws IOException 
	 */
	public void createDir() throws IOException{
		Path dir = new Path("/user/hadoop/mapreduce/input");
		fileSystem.mkdirs(dir);
	}
	
	/**
	 * 创建文件
	 * @throws IOException 
	 */
	public void createFile() throws IOException{
		Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt");
		FSDataOutputStream out = fileSystem.create(path);
		
		String data = "I believe, for every drop of rain that falls, A flower grows";
		out.writeChars(data);
	}
	
	/**
	 * 删除文件夹或文件
	 * @throws IOException 
	 * @throws IllegalArgumentException 
	 */
	public void deleteFile() throws IllegalArgumentException, IOException{
		Path path = new Path("/user/");
		
		if(fileSystem.exists(path)){
			fileSystem.delete(path, true);// 循环删除文件夹
		}else{
			System.out.println(path.getName() + "is not exists");
		}
	}
	
	/**
	 * 读取文件
	 * @throws IOException
	 */
	public void readFile() throws IOException{
		Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt");
		
		if(fileSystem.isFile(path)){
			ByteBuffer buf = ByteBuffer.allocate(1024);
			FSDataInputStream file = fileSystem.open(path);
			int read = 0;
			while((read = file.read(buf)) != -1){
				System.out.print(new String(buf.array()));
				buf.clear();
			};
		}
	}
	
	/**
	 * 展示列表文件
	 * @throws FileNotFoundException
	 * @throws IOException
	 */
	public void listFiles() throws FileNotFoundException, IOException{
		Path path = new Path("/user");
		// 获取其路径下的所有子文件夹或文件
		FileStatus[] listStatus = fileSystem.listStatus(path);
		for (FileStatus fileStatus : listStatus) {
			System.out.println(fileStatus);
		}
		
		// 展示所有的文件
		RemoteIterator<LocatedFileStatus> listFiles = fileSystem.listFiles(path, true);
		LocatedFileStatus next = null;
		while(listFiles.hasNext()){
			next = listFiles.next();
			System.out.println(next);
		}
	}
	
	/**
	 * 获取文件属性
	 * @throws IOException
	 */
	public void queryPosition() throws IOException{
		Path path = new Path("/user/hadoop/mapreduce/input/wordcount.txt");
		FileStatus fileStatus = fileSystem.getFileStatus(path);
		
		// 获取文件所在集群位置
		BlockLocation[] fileBlockLocations = fileSystem.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());
		for (BlockLocation blockLocation : fileBlockLocations) {
			System.out.println(blockLocation);//0,120,hadoop
		}
		
		// 获取checksum
		FileChecksum fileChecksum = fileSystem.getFileChecksum(path);
		//MD5-of-0MD5-of-512CRC32C:cb95b700877b44dab0fcfeb617d7f95d
		System.out.println(fileChecksum);
		
		// 获取集群中的所有节点信息
		DistributedFileSystem dfs = (DistributedFileSystem)fileSystem;
		DatanodeInfo[] dataNodeStats = dfs.getDataNodeStats();
		for (DatanodeInfo datanodeInfo : dataNodeStats) {
			System.out.println(datanodeInfo);//192.168.241.129:50010
		}
	}
	
	
	@Test
	public void readHDFSFile(){
		HDFSDemo d = new HDFSDemo();
		try {
			d.getFileSystem();// 获取FileSystem
//			d.deleteFile();// 删除
//			d.createDir();//创建文件夹
//			d.createFile();
//			d.listFiles();
//			d.readFile();
			d.queryPosition();
		} catch (IllegalArgumentException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		} catch (URISyntaxException e) {
			e.printStackTrace();
		} catch (InterruptedException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

注意：以上所使用的fileSystem就是刚才获取的FileSystem

总结：以上操作都是比较常规和简单的操作，故笔者不再详细叙述。有兴趣的同学可以多尝试一下其中的方法

Hadoop之FileSystem使用

猜你喜欢