1.keys
Features:
Back to all key value pairs
Examples
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.keys.collect.foreach(println)
result
hadoop spark hive spark list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[142] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[143] at map at command-3434610298353610:3
2.values
Features:
Return all value key-value pairs
Examples
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.values.collect.foreach(println)
result
1 1 1 1 list: List[String] = List(hadoop, spark, hive, spark) rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[145] at parallelize at command-3434610298353610:2 pairRdd: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[146] at map at command-3434610298353610:3
3.mapValues(func)
Features:
For each value of the keys are a function of the application, however, key changes will not happen.
Examples
val list = List("hadoop","spark","hive","spark") val rdd = sc.parallelize(list) val pairRdd = rdd.map(x => (x,1)) pairRdd.mapValues(_+1).collect.foreach(println)//对每个value进行+1
result
(hadoop, 2) (the Spark, 2) (Hive, 2) (the Spark, 2)
Original link: http: //www.mamicode.com/info-detail-2285651.html