Difference between old and new Hadoop API

Starting from version 0.20.0, Hadoop provides both old and new MapReduce APIs. Some early 0.20 releases objected to using the old api, but the old api can continue to be used in subsequent versions, so the 1.x and 2.x releases support the old api at the same time.

There are several notable differences between the old and new APIs as follows:

  • New APIs tend to use virtual classes instead of interfaces because they are more extensible.
  • The new API is moved to the org.apache.hadoop.mapreduce package and its subpackages, while the old API is placed under org.apache.hadoop.mapred.
  • The new API makes full use of context objects to enable user code to communicate within the MapReduce system. For example, the new Context basically unifies the functions of JobConf, OutputCollector and Reporter in the old API.
  • Key/value pair records are pushed to mappers and reducers in both types of APIs, but in addition to that, the new API allows mappers and reducers to control the flow of execution by overriding the run() method.
  • The job control in the new API is implemented by the Job class, while the JobClient class in the old API is deleted in the new API.
  • The newly added API realizes the unification of configuration. The old API configures jobs through a special JobConf object, which is an extension of the Hadoop configuration object. In the new API, the configuration of jobs is done by the Configuration class.
  • The output files are named slightly differently. In the old API, the output of map and reduce is named part-nnmm, but in the new API, the output file of map is named part-m-nnnn, and the output file of reduce is named part-r-nnnnn (where nnnnn is a 0-based integer representing the block sequence number).
  • In the new API, the user dash function is declared to throw the exception java.lang.InterruputedException. This means that interrupt response can be implemented in code, allowing the framework to gracefully cancel long-running jobs when necessary.
  • In the new API, the values ​​passed by reduce() are of type java.lang.Iterable instead of java.lang.Iterator, this change makes it easier for us to iterate over these values ​​with Uno’s for-each loop .
----------------Iterator迭代 ----------------------
public static class Reduce extends MapReduceBase implements Reducer<Text, LongWritable, Text, LongWritable>{

        @Override
        public void reduce(Text key, Iterator<LongWritable> values, OutputCollector<Text, LongWritable> output,
                Reporter reporter) throws IOException {
            long count = 0;

            while(values.hasNext()){
                count+=values.next().get();
            }
            output.collect(key, new LongWritable(count));
        }

    }
--------------Iterable for-each迭代---------------
public static class FriendListReducer extends Reducer<Text,Text,Text,Text>{
        private Text friendsNames=new Text();
        @Override
        protected void reduce(Text key,
                Iterable<Text> values,Context context) throws IOException, InterruptedException{
            StringBuffer buffer=new StringBuffer();
            for(Text name:values){
                buffer.append(","+name);
            }
            friendsNames.set(buffer.toString());
            context.write(key,friendsNames);
        }
    }

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325564188&siteId=291194637