Accumulator累加器（二）

Accumulator累加器

应用场景：Driver端定义一个共享变量，将数据累加到该变量上，如果直接用foreach或map等迭代算子，是无法将累加的变量返回到driver端，因为累加的过程发生在Executor端。一般用于计数场景下，变量往往声明在Driver端。

特性：变量在Driver端，累加的过程是在Executor端，在累加的过程Executor端是无法读取其值的，如果想读取其值，
只能在Driver端才能读取。
使用：
1.创建一个Accumulator累加器的实例
2.通过sc.register()注册一个累加器
3.通过累加器实名.add来添加数据
4.通过累加器实例名.value来后去累加器的值

import org.apache.spark.util.{
    
    DoubleAccumulator, LongAccumulator}
import org.apache.spark.{
    
    SparkConf, SparkContext}

object AccumlatorV2Demo {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val conf = new SparkConf().setAppName(this.getClass.getName).setMaster("local[2]")
    val sc = new SparkContext(conf)
    val nums1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9),2)
    val nums2 = sc.parallelize(List(1.2,2.4,3.4,4.0,5.0,6.0,7.0,8.0,9.0),2)
   //注册一个long累加器，并初始化累加器
    def longAcc(name:String) : LongAccumulator={
    
    
      val acc = new LongAccumulator
      sc.register(acc,name)
      acc
    }
    //注册一个double累加器，并初始化累加器
    def doubleAcc(name:String) : DoubleAccumulator={
    
    
      val acc = new DoubleAccumulator
      sc.register(acc,name)
      acc
    }
    val acc1: LongAccumulator = longAcc("LongAccumulator")
    nums1.foreach(x=>acc1.add(x))
    val acc2: DoubleAccumulator = doubleAcc(" DoubleAccumulator ")
    nums2.foreach(x=>acc2.add(x))
    println(acc1.value)
    println(acc2.value)

    sc.stop()
  }
}

Accumulator累加器（二）

Accumulator累加器

猜你喜欢