hadoop achieve the most basic numerical order, and is always sort multiple files.
Configuration:
System: Ubuntu 16.04
the Java: 1.8.0_191
hadoop: 1.2.1
precondition is configured hadoop environment variables and start.
Input Terminal:
jps
If at first and then the following process hadoop start successfully .
A, MapReduce implementation process
Second, explain the sorting algorithm
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Sort {
public static class Map extends
Mapper<Object, Text, IntWritable, IntWritable> {
private static IntWritable data = new IntWritable();
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
data.set(Integer.parseInt(line));
context.write(data, new IntWritable(1));
}
}
public static class Reduce extends
Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
private static IntWritable linenum = new IntWritable(1);
public void reduce(IntWritable key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
for (IntWritable val : values) {
context.write(linenum, key);
linenum = new IntWritable(linenum.get() + 1);
}
}
}
public static class Partition extends Partitioner<IntWritable, IntWritable> {
@Override
public int getPartition(IntWritable key, IntWritable value,
int numPartitions) {
int MaxNumber = 65223;
int bound = MaxNumber / numPartitions + 1;
int keynumber = key.get();
for (int i = 0; i < numPartitions; i++) {
if (keynumber < bound * i && keynumber >= bound * (i - 1))
return i - 1;
}
return 0;
}
}
/**
* @param args
*/
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage WordCount <int> <out>");
System.exit(2);
}
Job job = new Job(conf, "Sort");
job.setJarByClass(Sort.class);
job.setMapperClass(Map.class);
job.setPartitionerClass(Partition.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Third, specific operation
1. Create a project directory
cd /usr/projects/hadoopExamples
mkdir numbersort
2, written Sort.java file and compile (javac)
The above java code written into the copy and save Sort.java;
vi Sort.java
Create a directory to store number_sort_class compiled .class files;
mkdir number_sort_class
Compile Sort.java file and compiled .class files stored in the number_sort_class folder;
/opt/hadoop-1.2.1 for my hadoop installation directory.
javac -classpath /opt/hadoop-1.2.1/hadoop-core-1.2.1.jar:/opt/hadoop-1.2.1/lib/commons-cli-1.2.jar -d number_sort_class/ Sort.java
3, the compiled .class file packaging
* On behalf of all the .class files,
packaged good jar named numberSort.jar;
cd number_sort_class
jar -cvf numberSort.jar *.class
result:
4. Create an input file
Back to the parent directory, create input directory, and create three files, which each store multiple numbers.
E.g:
cd ..
mkdir input
cd input
vi number1
vi number2
vi number3
cd ..
5, upload the input file to the hadoop
You must first create a folder to upload,
the next input all the files are uploaded to input_numersort under hadoop in;
hadoop fs -mkdir input_numbersort
hadoop fs -put input/* input_numersort/
View uploaded files;
hadoop fs -ls
Results:
see specific file directory:
hadoop fs -ls input_numbersort
Results:
It was our number three input file folder!
View file details:
hadoop fs -cat input_numbersort/number2
result:
6, the package and run the jar synthesized output
Road King jar package is number_sort_class / numberSort.jar,
the Sort java file name,
input_numbersort input folder,
out_numbersort (automatically created) for the output file;
hadoop jar number_sort_class/numberSort.jar Sort input_numbersort out_numbersort
Result:
you can see the first map to reach 100%, reduce began execution.
hadoop file viewing at:
hadoop fs -ls
Structure, we found more than a out_numbersort, the output is just out of
view out_numbersort catalog:
hadoop fs -ls out_numbersort
Results:
Results in the part-r-00000.
7 View Results
View Order results.
hadoop fs -cat out_numbersort/part-r-00000
Results:
here the use of MapReduce hadoop sort has been completed, after learning of more complex applications hadoop helpful.
Uploaded to github https://github.com/NH4L/hadoopSort/tree/master