I. Overview
- MapReduce end of the stage, OutputFormat class determines how to generate output Reducer
- Hadoop itself provides a number of built-in OutputFormat
- If you do not explicitly specify defaults TextOutputFormat
Second, the common subclass
- TextOutputFormat - spaced rows, comprising a key-on tab delimited text file format
- SequenceFileOutputFormat - compressed binary format of key data
- SequenceFileAsBinaryOutputFormat - compression format native binary data
- MapFileOutputFormat - form part of the index keys using a
- MultipleOutputFormat - abstract class for key-value parameter write files
- MultipleTextOutputFormat - outputting a plurality of standard row division, tab-delimited file formats
- MultipleSequenceFileOutputFormat - more compressed output file format
- DBOutputFormat - to write data to the database to a specified form
Third, custom output format
Outline
- All OutputFormat have direct or indirect abstract class inherits OutputFormat
- OutputFormat abstract class defines the following abstract methods: getRecordWriter (TaskAttemptContext context), checkOutputSpecs (JobContext context) and getOutputCommitter (TaskAttemptContext context)
- If the output destination is a file, can inherit FileOutputFormat, this class implements checkOutputSpecs and getOutputCommitter method and getRecordWriter () is provided in claim subclasses implement the abstract methods
- If you want a more sophisticated logic can be changed to write your own getOutputCommitter method and checkOutputSpecs
Fourth, multi-source output
public class MoutMapper extends Mapper<LongWritable, Text, Text, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] arr = line.split(" ");
context.write(new Text(arr[0]), new Text(arr[1]));
}
}
public class MoutReducer extends Reducer<Text, Text, Text, Text> {
private MultipleOutputs<Text, Text> mo;
@Override
protected void setup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {
mo = new MultipleOutputs<>(context);
}
public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
String name = key.toString();
Text value = values.iterator().next();
if (name.charAt(0) <= 'I')
mo.write("a2i", key, value);
else
mo.write("j2z", key, value);
}
}
public class MoutDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "JobName");
job.setJarByClass(cn.tedu.multiout.MoutDriver.class);
job.setMapperClass(MoutMapper.class);
job.setReducerClass(MoutReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
FileInputFormat.setInputPaths(job, new Path("hdfs://192.168.32.147:9000/txt/score2.txt"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.32.147:9000/result4"));
MultipleOutputs.addNamedOutput(job, "a2i", TextOutputFormat.class, Text.class, Text.class);
MultipleOutputs.addNamedOutput(job, "j2z", TextOutputFormat.class, Text.class, Text.class);
if (!job.waitForCompletion(true))
return;
}
}