hadoop的outputformat

I. Overview

  1. MapReduce end of the stage, OutputFormat class determines how to generate output Reducer
  2. Hadoop itself provides a number of built-in OutputFormat
  3. If you do not explicitly specify defaults TextOutputFormat

 

Second, the common subclass

  1. TextOutputFormat - spaced rows, comprising a key-on tab delimited text file format
  2. SequenceFileOutputFormat - compressed binary format of key data
    1. SequenceFileAsBinaryOutputFormat - compression format native binary data
  3. MapFileOutputFormat - form part of the index keys using a
  4. MultipleOutputFormat - abstract class for key-value parameter write files
    1. MultipleTextOutputFormat - outputting a plurality of standard row division, tab-delimited file formats
    2. MultipleSequenceFileOutputFormat - more compressed output file format
  5. DBOutputFormat - to write data to the database to a specified form

 

Third, custom output format

Outline

  1. All OutputFormat have direct or indirect abstract class inherits OutputFormat
  2. OutputFormat abstract class defines the following abstract methods: getRecordWriter (TaskAttemptContext context), checkOutputSpecs (JobContext context) and getOutputCommitter (TaskAttemptContext context)
  3. If the output destination is a file, can inherit FileOutputFormat, this class implements checkOutputSpecs and getOutputCommitter method and getRecordWriter () is provided in claim subclasses implement the abstract methods
  4. If you want a more sophisticated logic can be changed to write your own getOutputCommitter method and checkOutputSpecs

 

Fourth, multi-source output

public class MoutMapper extends Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

       String line = value.toString();

        String[] arr = line.split(" ");

        context.write(new Text(arr[0]), new Text(arr[1]));

    }

}
public class MoutReducer extends Reducer<Text, Text, Text, Text> {

    private MultipleOutputs<Text, Text> mo;

    @Override

    protected void setup(Reducer<Text, Text, Text, Text>.Context context) throws IOException, InterruptedException {

        mo = new MultipleOutputs<>(context);

    }



    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {



        String name = key.toString();

        Text value = values.iterator().next();

        if (name.charAt(0) <= 'I')

            mo.write("a2i", key, value);

        else

            mo.write("j2z", key, value);



    }



}
public class MoutDriver {



    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "JobName");

        job.setJarByClass(cn.tedu.multiout.MoutDriver.class);

        job.setMapperClass(MoutMapper.class);

        job.setReducerClass(MoutReducer.class);

        

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(Text.class);



        FileInputFormat.setInputPaths(job, new Path("hdfs://192.168.32.147:9000/txt/score2.txt"));

        FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.32.147:9000/result4"));

        

        MultipleOutputs.addNamedOutput(job, "a2i", TextOutputFormat.class, Text.class, Text.class);

        MultipleOutputs.addNamedOutput(job, "j2z", TextOutputFormat.class, Text.class, Text.class);

        

        if (!job.waitForCompletion(true))

            return;

    }



}

 

Guess you like

Origin blog.csdn.net/yang134679/article/details/93781347