一、集合标量行动操作
- first: 返回RDD中第一个元素,不排序
- count: 返回RDD中元素数量
- reduce: 根据映射函数,对RDD进行计算
- collect: 将RDD转化为数组
- take(num): 获取RDD中从0到num-1下标的元素,不排序
- top(num): 按照默认降序排序的或者指定规则排序返回前num个元素
- takeOrdered(num): 和top类似,只不过它是以升序排序,返回前num个元素
val rdd =sc.makeRDD(List(("E",5),("B",2),("A",1),("D",4),("C",3),("H",7)),2)
scala> rdd.first
res0: (String, Int) = (E,5)
scala> rdd.take(3)
res1: Array[(String, Int)] = Array((E,5), (B,2),(A,1))
scala> rdd.top(3)
res2: Array[(String, Int)] = Array((H,7), (E,5),(D,4))
scala> rdd.takeOrdered(3)
res3: Array[(String, Int)] = Array((A,1), (B,2),(C,3))
scala> rdd.count
res4: Long = 6
scala> rdd.collect
res5: Array[(String, Int)] = Array((E,5), (B,2),(A,1), (D,4), (C,3), (H,7))
scala> rdd.reduce((x,y) => (x._1 + y._1, x._2+ y._2))
res6: (String, Int) = (DCHEBA,22)
- lookUp(key:K):Seq[V] 指定K值,返回RDD中该K对应的所有V值
val rdd = sc.makeRDD(Array(("A",0),("A",2),("B",3)))
rdd.lookUp("A")
res0: Seq[Int] = WrappedArray(0,2)
- countByKey(): Map[K, Long] 统计RDD[K,V]中k的个数
- countByValue()(implicit ord: Ordering[T] = null):Map[T, Long] 统计RDD[K,V]中V的个数
- foreach(f: T => Unit): Unit 遍历每一个元素
- foreachPartition(f: Iterator[T] => Unit): Unit遍历每一个分区
- sortBy[K](f: (T) => K,ascending: Boolean =true,numPartitions: Int = this.partitions.length)(implicit ord: Ordering[K],ctag: ClassTag[K]): RDD[T] 根据指定的排序函数将RDD中的元素进行排序
- sortByKey(ascending: Boolean = true,numPartitions: Int = self.partitions.length)
val rdd1 = sc.makeRDD(List(5,1,6,9,2))
val rdd2 =sc.makeRDD(List("hadoop","spark","hive","endeca","storm"))
val rdd3 = rdd1.zip(rdd2)
rdd3.sortByKey().collect
Array((1,spark), (2,storm), (5,hadoop), (6,hive),(9,endeca))
rdd3.sortByKey(false).collect
Array((9,endeca), (6,hive), (5,hadoop), (2,storm),(1,spark))
二、存储行动操作
- saveAsTextFile(path: String): Unit 以文本文件形式存储
- saveAsTextFile(path: String, codec: Class[_ <:CompressionCodec]): Unit 以文本文件形式存储,并且可以指定压缩类型
- saveAsObjectFile(path: String): Unit 将RDD元素序列化成对象存入文件