maven编程访问hdfs文件的block存储位置

1、新建maven project

//查询文件的block块的位置
	public static void getFileLocal(String[] args) throws Exception {

		Configuration conf = new Configuration();
		FileSystem hdfs = FileSystem.get(conf);
		Path fpath = new Path(args[0]);

		FileStatus fileStatus = hdfs.getFileStatus(fpath);
		BlockLocation[] blkLocations = hdfs.getFileBlockLocations(fileStatus, 0, fileStatus.getLen());

		int blockLen = blkLocations.length;

		for (int i = 0; i < blockLen; ++i) {
			String[] hosts = blkLocations[i].getHosts();
			for(String host : hosts){
			System.out.println("block_" + i + "_location:" + host);
			}
		}
	}

2、pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>FileCheck</groupId>
  <artifactId>FileCheck</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>FileCheck</name>
  <url>http://maven.apache.org</url>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>  
        <groupId>org.apache.hadoop</groupId>  
        <artifactId>hadoop-mapreduce-client-core</artifactId>  
        <version>2.3.0</version>  
    </dependency>  
    <dependency>  
        <groupId>org.apache.hadoop</groupId>  
        <artifactId>hadoop-hdfs</artifactId>  
        <version>2.3.0</version>  
    </dependency>  
    <dependency>  
        <groupId>org.apache.hadoop</groupId>  
        <artifactId>hadoop-common</artifactId>  
        <version>2.3.0</version> 
    </dependency>
    <dependency>
		<groupId>jdk.tools</groupId>
		<artifactId>jdk.tools</artifactId>
		<version>1.6</version>
		<scope>system</scope>
		<systemPath>C:/Program Files/Java/jdk1.8.0_73/lib/tools.jar</systemPath>
	</dependency>	   
  </dependencies>
  <build>
		<plugins>
			<plugin>	
				<artifactId>maven-assembly-plugin</artifactId>  
                <configuration>  
                    <descriptorRefs>  
                        <descriptorRef>jar-with-dependencies</descriptorRef>  
                    </descriptorRefs>  				
					<archive>
						<manifest>
							<mainClass>FileCheck.FileCheck.FindFileonHDFS</mainClass>
						</manifest>
					</archive>
				</configuration>
					<executions>
					<execution>
						<id>make-assembly</id>
						<phase>package</phase>
						<goals>
							<goal>single</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
		</build>
</project>

3、maven clean;maven package

生成两个jar包。

4、

FileCheck-0.0.1-SNAPSHOT.jar不包含任何第三方引用包,并且不包含main-class

运行时需要添加主类:yarn jar FileCheck-0.0.1-SNAPSHOT.jar  FileCheck.FileCheck.FindFileonHDFS  /tmp/ooa_0

FileCheck-0.0.1-SNAPSHOT-jar-with-dependencies.jar包含第三方引用包,并且包含main-class

运行:yarn jar FileCheck-0.0.1-SNAPSHOT-jar-with-dependencies.jar   /tmp/00a_0

猜你喜欢

转载自my.oschina.net/u/2489618/blog/733551