- reduce源码
/** * Reduces the elements of this RDD using the specified commutative and * associative binary operator. */ def reduce(f: (T, T) => T): T = withScope { val cleanF = sc.clean(f) val reducePartition: Iterator[T] => Option[T] = iter => { if (iter.hasNext) { Some(iter.reduceLeft(cleanF)) } else { None } } var jobResult: Option[T] = None val mergeResult = (index: Int, taskResult: Option[T]) => { if (taskResult.isDefined) { jobResult = jobResult match { case Some(value) => Some(f(value, taskResult.get)) case None => taskResult } } } sc.runJob(this, reducePartition, mergeResult) // Get the final result out of our Option, or throw an exception if the RDD was empty jobResult.getOrElse(throw new UnsupportedOperationException("empty collection")) }
- scala reduceLeft 源码
/** Applies a binary operator to all elements of this $coll, * going left to right. * $willNotTerminateInf * $orderDependentFold * * @param op the binary operator. * @tparam B the result type of the binary operator. * @return the result of inserting `op` between consecutive elements of this $coll, * going left to right: * {{{ * op( op( ... op(x_1, x_2) ..., x_{n-1}), x_n) * }}} * where `x,,1,,, ..., x,,n,,` are the elements of this $coll. * @throws UnsupportedOperationException if this $coll is empty. */ def reduceLeft[B >: A](op: (B, A) => B): B = { if (isEmpty) throw new UnsupportedOperationException("empty.reduceLeft") var first = true var acc: B = 0.asInstanceOf[B] for (x <- self) { if (first) { acc = x first = false } else acc = op(acc, x) } acc }
reduceLeft :将acc转换成B类型,执行op函数将上一次的计算结果和下一次的元素进行执行op函数并赋值给acc
reduce:先遍历RDD的每个分区,在每个分区上执行自定义的聚合函数,然后定义每个分区之间的merge函数,
spark2.3 RDD之reduce源码解析
猜你喜欢
转载自blog.csdn.net/dpnice/article/details/80054614
今日推荐
周排行