Write in front
This article uses a small instance of HDFS to determine whether a file exists in a Hadoop 2.7.7 distributed cluster environment to introduce how to use the command line to compile and package HDFS programs in the Hadoop 2.x version.
Add Hadoop classhpath information to CLASSPATH variable
In the Hadoop 2.x version, jars are no longer concentrated in one hadoop-core * .jar, but are divided into multiple jars. For example, using Hadoop 2.7.7 to run WordCount instances requires at least the following three jars:
- $HADOOP_HOME/share/hadoop/common/hadoop-common-2.7.7.jar
- $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.7.jar
- $HADOOP_HOME/share/hadoop/common/lib/commons-cli-1.2.jar
In fact, by command hadoop classpath
we can get all classpath information needed to run Hadoop program.
We add the Hadoop classhpath information to the CLASSPATH variable, and add the following lines to ~ / .bashrc:
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
Do not forget to execute source ~/.bashrc
the variables to take effect.
Compile, package and execute HDFS programs
Write HDFS program, here is a small chestnut to determine whether the specified file exists or not
vi FileExist.java
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class FileExist {
public static void main(String[] args){
try{
String fileName = "test";
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://Master:9000");
conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
FileSystem fs = FileSystem.get(conf);
if(fs.exists(new Path(fileName))){
System.out.println("文件存在");
}else{
System.out.println("文件不存在");
}
}catch (Exception e){
e.printStackTrace();
}
}
}
javac
Command to compile FileExist.java
javac FileExist.java
After compilation, you can see that a .class file is generated
Then package the .class file into a jar to run in Hadoop
jar -cvf FileExist.jar ./FileExist*.class
After packaging, you can find that a FileExist.jar package has been generated
Next we can run the jar package
hadoop jar FileExist.jar FileExist
FileExist.jar is the jar package we run, FileExist is the class where the main method of the jar package is located
operation result
Use the command-line compiler package running HDFS small sample program to write here, of course, you can also use the command line compiler package running MapReduce programs, with the compiler package to run a program similar to HDFS specific reference run using the command line compiler package own MapReduce programs