Accumulator (1)

accumulator

Application scenario : The driver side defines a shared variable and accumulates data to the variable. If you directly use iterative operators such as foreach or map, the accumulated variable cannot be returned to the driver side, because the accumulation process takes place on the Executor side. Generally used in counting scenarios, variables are often declared on the Driver side.

Features : Variables are on the Driver side, and the accumulation process is on the Executor side. In the accumulation process, the Executor side cannot read its value. If you want to read its value, you
can only read it on the Driver side.
Custom accumulator usage (this example is non-custom accumulator):
1. Create an instance of Accumulator accumulator
2. Register an accumulator through sc.register()
3. Add data through the real name of the accumulator . Add
4. Pass the accumulator instance name.value to and from the accumulator value

import org.apache.spark.{
    
    SparkConf, SparkContext}

object AccumlatorDemo {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val conf = new SparkConf()
    conf.setAppName(this.getClass.getName).setMaster("local[2]")
    val sc = new SparkContext(conf)
    val numsRdd = sc.parallelize(List(1,2,3,4,5,6))
    //var sum :Int= 0
    //numsRdd.map(x=>sum+=x)//0
    //numsRdd.foreach(x=>sum+=x)//0
    //sum=numsRdd.reduce(_+_)//21
    //使用accumulator实现给共享变量的聚合值的过程
    val sum =sc.accumulator(0)
    numsRdd.foreach(x=>sum+=x)
    println(sum)
    sc.stop()
  }
}

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108440398