1、计数器是最简单的累加器,系统内置了很多累加器,如IntCounter、LongCounter、DoubleCounter..
2、如何使用计数器:
第一步:在自定义的转换操作里创建累加器对象:private IntCounter numLines = new IntCounter();
第二步:注册累加器对象,通常是在rich function的open()方法中。这里你还需要定义累加器的名字getRuntimeContext().addAccumulator(“num-lines”, this.numLines);
第三步:在operator函数的任何地方使用累加器,包括在open()和close()方法中this.numLines.add(1);
第四步:结果存储在JobExecutionResult里:JobExecutionResult jobExecutionResult =env.execute("Accumulator"); jobExecutionResult .getAccumulatorResult("num-lines");
public static void main(String[] args) throws Exception {
//1、获取命令行参数
final ParameterTool params = ParameterTool.fromArgs(args);
//2、 set up the execution environment
final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// make parameters available in the web interface
env.getConfig().setGlobalJobParameters(params);
// get input data
DataSet<String> text;
if (params.has("input")) {
// read the text file from given input path
text = env.readTextFile(params.get("input"));
} else {
// get default test text data
System.out.println("Executing WordCount example with default input data set.");
System.out.println("Use --input to specify file input.");
text = WordCountData.getDefaultTextLineDataSet(env);
}
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.groupBy(0)
.sum(1);
// emit result
if (params.has("output")) {
counts.writeAsCsv(params.get("output"), "\n", " ");
// execute program
JobExecutionResult jobExecutionResult =env.execute("WordCount Example");
jobExecutionResult.getAccumulatorResult("num-lines").toString();
} else {
System.out.println("Printing result to stdout. Use --output to specify output path.");
counts.print();
}
}
public static final class Tokenizer extends RichFlatMapFunction<String, Tuple2<String, Integer>> {
private IntCounter numLines = new IntCounter();
@Override
public void open(Configuration parameters) throws Exception {
getRuntimeContext().addAccumulator("num-lines", this.numLines);
super.open(parameters);
}
@Override
public void close() throws Exception {
super.close();
}
@Override
public void flatMap(String value, Collector<Tuple2<String, Integer>> out) {
this.numLines.add(1);
// normalize and split the line
//"To be, or not to be,--that is the question:--",
String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<>(token, 1));
}
}
}
}
如果我们需要在更加复杂的场景下用到累加器,可以实现自定义累加器,通过实现Accumulator接口或者SimpleAccumulator接口即可。
Accumulator<V,R>是最灵活的。它定了一个V类型的值可以进行累加,和一个R类型的值作为最终结果。例如:针对一个直方图,V是一个数字,R是一个直方图。SimpleAccumulator的情况下,这两种类型都是一样的,例如:counters累加器。