Spark custom accumulator (1)

Custom Accumulator (summing type)

Application scenario : The driver side defines a shared variable and accumulates the data to the variable. If you directly use iterative operators such as foreach or map, the accumulated variable cannot be returned to the driver side, because the accumulation process takes place on the Executor side. Generally used in counting scenarios, variables are often declared on the Driver side.

Features : Variables are on the Driver side, and the accumulation process is on the Executor side. In the accumulation process, the Executor side cannot read its value. If you want to read its value, you
can only read it on the Driver side.
Use :
1. Create an instance of the Accumulator
2. Register an accumulator through sc.register()
3. Add data
through the real name of the accumulator . Add 4. Pass the accumulator instance name. value to and from the accumulator value

import org.apache.spark.{
    
    SparkConf, SparkContext}
import org.apache.spark.util.{
    
    AccumulatorV2, DoubleAccumulator, LongAccumulator}

object AccumulatorV2Demo_2 {
    
    
  def main(args: Array[String]): Unit = {
    
    
    val conf = new SparkConf().setAppName(this.getClass.getName).setMaster("local[2]")
    val sc = new SparkContext(conf)
    val nums1 = sc.parallelize(List(1,2,3,4,5,6,7,8,9),2)
    val nums2 = sc.parallelize(List(1.2,2.4,3.4,4.0,5.0,6.0,7.0,8.0,9.0),2)
    //获取自定义accumulator累加器的实例
    val accumulator = new MyAccumulator()
    //注册
    sc.register(accumulator,"acc")
    nums1.foreach(x=> accumulator.add(x))
    println(accumulator.value)
    sc.stop()
  }
}

/**
 * AccumulatorV2[in,out]:需要自定义输入类型和输出类型
 */
class MyAccumulator extends  AccumulatorV2[Int,Int]{
    
    
  //初始化一个输出值变量
  private  var sum :Int =_

  /**
   * 检查方法是否为空
   * @return
   */
  override def isZero: Boolean = sum==0

  /**
   * copy一个新的累加器
   * @return
   */
  override def copy(): AccumulatorV2[Int, Int] = {
    
    
    val acc = new MyAccumulator
    acc.sum = this.sum
    acc
  }

  /**
   * 重置一个累加器,相当于将累加器的数据清零
   */
  override def reset(): Unit = sum=0

  /**
   * 局部聚合:每一个分区中进行累加的过程
   * @param v
   */
  override def add(v: Int): Unit = {
    
    
    sum += v
  }

  /**
   * 全局聚合,将各个分区的结果进行合并的过程
   * @param other
   */
  override def merge(other: AccumulatorV2[Int, Int]): Unit = {
    
    
    sum +=other.value
  }

  /**
   * 最终的结果,可以对该方法中结果数据,进行操作再返回
   * @return
   */
  override def value: Int = sum
}

Guess you like

Origin blog.csdn.net/qq_42706464/article/details/108440525