07_Flink累加器

1、计数器是最简单的累加器,系统内置了很多累加器,如IntCounter、LongCounter、DoubleCounter..
2、如何使用计数器:
第一步:在自定义的转换操作里创建累加器对象:private IntCounter numLines = new IntCounter();
第二步:注册累加器对象,通常是在rich function的open()方法中。这里你还需要定义累加器的名字getRuntimeContext().addAccumulator(“num-lines”, this.numLines);
第三步:在operator函数的任何地方使用累加器,包括在open()和close()方法中this.numLines.add(1);
第四步:结果存储在JobExecutionResult里:JobExecutionResult jobExecutionResult =env.execute("Accumulator"); jobExecutionResult .getAccumulatorResult("num-lines");

public static void main(String[] args) throws Exception {

   //1、获取命令行参数
   final ParameterTool params = ParameterTool.fromArgs(args);

   //2、 set up the execution environment
   final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

   // make parameters available in the web interface
   env.getConfig().setGlobalJobParameters(params);

   // get input data
   DataSet<String> text;
   if (params.has("input")) {
      // read the text file from given input path
      text = env.readTextFile(params.get("input"));
   } else {
      // get default test text data
      System.out.println("Executing WordCount example with default input data set.");
      System.out.println("Use --input to specify file input.");
      text = WordCountData.getDefaultTextLineDataSet(env);
   }


   DataSet<Tuple2<String, Integer>> counts =
         // split up the lines in pairs (2-tuples) containing: (word,1)
         text.flatMap(new Tokenizer())
         // group by the tuple field "0" and sum up tuple field "1"
         .groupBy(0)
         .sum(1);

   // emit result
   if (params.has("output")) {
      counts.writeAsCsv(params.get("output"), "\n", " ");
      // execute program
      JobExecutionResult jobExecutionResult =env.execute("WordCount Example");
      jobExecutionResult.getAccumulatorResult("num-lines").toString();
   } else {
      System.out.println("Printing result to stdout. Use --output to specify output path.");
      counts.print();
   }
}

public static final class Tokenizer extends RichFlatMapFunction<String, Tuple2<String, Integer>> {
   private IntCounter numLines = new IntCounter();

   @Override
   public void open(Configuration parameters) throws Exception {
      getRuntimeContext().addAccumulator("num-lines", this.numLines);
      super.open(parameters);
   }

   @Override
   public void close() throws Exception {
      super.close();
   }

   @Override
   public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
      this.numLines.add(1);
      // normalize and split the line
      //"To be, or not to be,--that is the question:--",
      String[] tokens = value.toLowerCase().split("\\W+");

      // emit the pairs
      for (String token : tokens) {
         if (token.length() > 0) {
            out.collect(new Tuple2<>(token, 1));
         }
      }
   }
}

如果我们需要在更加复杂的场景下用到累加器,可以实现自定义累加器,通过实现Accumulator接口或者SimpleAccumulator接口即可。
Accumulator<V,R>是最灵活的。它定了一个V类型的值可以进行累加,和一个R类型的值作为最终结果。例如:针对一个直方图,V是一个数字,R是一个直方图。SimpleAccumulator的情况下,这两种类型都是一样的,例如:counters累加器。

发布了3 篇原创文章 · 获赞 86 · 访问量 22万+

猜你喜欢

转载自blog.csdn.net/qq285016127/article/details/105135908