判断一个算子是否会导致shuffle的方法

scala> val a  = sc.parallelize(Array(1,2,3)).distinct
a: org.apache.spark.rdd.RDD[Int] = MapPartitionsRDD[3] at distinct at <console>:24

scala> a.toDebugString
res0: String =
(16) MapPartitionsRDD[3] at distinct at <console>:24 []
 |   ShuffledRDD[2] at distinct at <console>:24 []
 +-(16) MapPartitionsRDD[1] at distinct at <console>:24 []
    |   ParallelCollectionRDD[0] at parallelize at <console>:24 []

scala> 

scala> 

scala> val b  = sc.parallelize(Array(1,2,3))
b: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at <console>:24

scala> b.toDebugString
res1: String = (16) ParallelCollectionRDD[4] at parallelize at <console>:24 []


 

猜你喜欢

转载自blog.csdn.net/appleyuchi/article/details/107734879
今日推荐