1、启动spark
spark-shell --master local[2]
2、创建一个最简单的RDD
val rdd = sc.makeRDD(List(1,2,3,4,5));
3、查看RDD
rdd.collect()
返回
res0: Array[Int] = Array(1, 2, 3, 4, 5)
4、RDD指定分区(这样9个数据,就放在了3个分区中)
val rdd = sc.makeRDD(List(1,2,3,4,5,6,7,8,9),3)
5、查看分区的方法
执行以下代码,定义rddUtil
import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag
object rddUtil {
def lookPartition[T: ClassTag](rdd: RDD[T]) = {
rdd.mapPartitionsWithIndex((i: Int, it: Iterator[T]) => {
val partitionMap = scala.collection.mutable.Map[Int, List[T]]()
var valueList = List[T]()
while (it.hasNext) {
valueList = valueList :+ it.next
}
partitionMap(i) = valueList
partitionMap.iterator
}).collect().foreach((partitionMap:(Int, List[T])) => {
val partition = partitionMap._1
println("partition:["+partition+"]")
partitionMap._2.foreach {println(_) }
})
}
}
执行查看
rddUtil.lookPartition(rdd)
partition:[0]
1
2
3
partition:[1]
4
5
6
partition:[2]
7
8
9