spark2.3 RDD之 map 源码解析

spark map源码

/**
   * Return a new RDD by applying a function to all elements of this RDD.
   */
  def map[U: ClassTag](f: T => U): RDD[U] = withScope {
    val cleanF = sc.clean(f)
    new MapPartitionsRDD[U, T](this, (context, pid, iter) => iter.map(cleanF))
  }

scala map 源码

/** Creates a new iterator that maps all produced values of this iterator
   *  to new values using a transformation function.
   *
   *  @param f  the transformation function
   *  @return a new iterator which transforms every value produced by this
   *          iterator by applying the function `f` to it.
   *  @note   Reuse: $consumesAndProducesIterator
   */
  def map[B](f: A => B): Iterator[B] = new AbstractIterator[B] {
    def hasNext = self.hasNext
    def next() = f(self.next())
  }
map将RDD原分区的 iterator 的每一个元素调用 传入函数 f ,底层用Scala的map 方法, 回调函数map的next,将每一个元素进行计算处理,最后返回一个新的RDD,新的RDD的分区数 保持不变。

猜你喜欢

转载自blog.csdn.net/dpnice/article/details/80092247
今日推荐