hadoop Tool和ToolRunner

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接: https://blog.csdn.net/answer100answer/article/details/98864269

https://blog.csdn.net/jediael_lu/article/details/76902739

1. Tool / Configurable / Configured

一个tool接口用于支持处理普通的命令行参数
Tool,代表的是任何抽象的Map-Reduce 工具/应用。Tool/application应该代表 ToolRunner.run(Tool,String[])标准命令行的处理,以及处理自定义的参数。

Tool接口继承Configurable,仅定义了一个方法。

public interface Tool extends Configurable {
  /**
   * Execute the command with the given arguments.
   * 
   * @param args command specific arguments.
   * @return exit code.
   * @throws Exception
   */
  int run(String [] args) throws Exception;
}

Configurable的源文件如下:

public interface Configurable {
  void setConf(Configuration conf);
  Configuration getConf();
}

所以,可以利用Tool的实现,来打印所有的属性,如 以下自定义程序 ConfigurationPrinter

public class ConfigurationPrinter extends Configured implements Tool {
 
  static {
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
  }
  @Override
  public int run(String[] args) throws Exception {
    Configuration conf = getConf();
    for (Entry entry: conf) {
      System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
    }
    return 0;
  }
 
  public static void main(String[] args) throws Exception {
    int exitCode = ToolRunner.run(new ConfigurationPrinter(), args);
    System.exit(exitCode);
  }
}

再看Configured

public class Configured implements Configurable {

  public Configuration conf;

  public Configured() { this(null); }
  public Configured(Configuration conf) {
    setConf(conf);
  }
  @Override
  public void setConf(Configuration conf) {
    this.conf = conf;
  }
  @Override
  public Configuration getConf() {
    return conf;
  }
}

Configured.java的作用是设置conf

所以,一个典型的Tool实现为:继承Configured并实现Tool,需要实现run方法即可。

         public class MyApp extends Configured implements Tool {
         
           public int run(String[] args) throws Exception {
             // Configuration processed by ToolRunner
             Configuration conf = getConf();
             
             // Create a JobConf using the processed conf
             JobConf job = new JobConf(conf, MyApp.class);
             
             // Process custom command-line options
             Path in = new Path(args[1]);
             Path out = new Path(args[2]);
             
             // Specify various job-specific parameters     
             job.setJobName("my-app");
             job.setInputPath(in);
             job.setOutputPath(out);
             job.setMapperClass(MyMapper.class);
             job.setReducerClass(MyReducer.class);

             // Submit the job, then poll for progress until the job is complete
             RunningJob runningJob = JobClient.runJob(job);
             if (runningJob.isSuccessful()) {
               return 0;
             } else {
               return 1;
             }
           }
           
           public static void main(String[] args) throws Exception {
             // Let ToolRunner handle generic command-line options 
             int res = ToolRunner.run(new Configuration(), new MyApp(), args);
             
             System.exit(res);
           }
         }    

由上可见,关于ToolRunner的典型用法是:

  1. 定义一个类,继承Configured,实现Tool接口。其中Configured提供了getConf()setConfig()方法,而Tool则提供了run()方法。
  2. main()方法中通过ToolRunner.run(…)方法调用上述类的run(String[]方法)。

2. ToolRunner

  1. ToolRunner与上图中的类、接口无任何的继承、实现关系,它只继承了Object,没实现任何接口。
  2. ToolRunner可以方便的运行那些实现了Tool接口的类(调用其run(String[])方法,并通过GenericOptionsParser 可以方便的处理hadoop命令行参数。

ToolRunner类分析如下:

public class ToolRunner {

  public static int run(Configuration conf, Tool tool, String[] args) 
   ...
  }
 
  public static int run(Tool tool, String[] args) 
    return run(tool.getConf(), tool, args);
  }
  
  public static void printGenericCommandUsage(PrintStream out) { ... }
  
  public static boolean confirmPrompt(String prompt)  { ...  }

}

ToolRunner完成以下2个功能:

(1)为Tool创建一个Configuration对象。
(2)使得程序可以方便的读取参数配置。

其中run方法如下:

  public static int run(Configuration conf, Tool tool, String[] args) 
    throws Exception{
    if(conf == null) {
      conf = new Configuration();
    }
    GenericOptionsParser parser = new GenericOptionsParser(conf, args);
    //set the configuration back, so that Tool can configure itself
    tool.setConf(conf);
    
    //get the args w/o generic hadoop args
    String[] toolArgs = parser.getRemainingArgs();
    return tool.run(toolArgs);
  }

猜你喜欢

转载自blog.csdn.net/answer100answer/article/details/98864269