spark 相同的key的value聚合成一个

wordcount例子

val conf = new SparkConf().setAppName("GroupAndReduce").setMaster("local")
val sc = new SparkContext(conf)
val words = Array("one", "two", "two", "three", "three", "three")
val wordsRDD = sc.parallelize(words).map(row => (row, 1))
val wordsCountWithGroup = wordsRDD.
groupByKey().  // 其实groupByKey之后下面的 pair._2 已经成了一个value的list
map(pair => (pair._1, pair._2.sum)). // pair._1 和 pair._2 代表 word 和 list(里面都是1)
collect().
foreach(println)

如果是要把string(“abc”)聚合成一个

val conf = new SparkConf().setAppName("GroupAndReduce").setMaster("local")
val sc = new SparkContext(conf)
val words = Array("one", "two", "two", "three", "three", "three")
val wordsRDD = sc.parallelize(words).map(row => (row, "abc"))
val wordsCountWithGroup = wordsRDD.
groupByKey().  
map(pair => {
  val onestr = pair._2.toArray.sorted.mkString("@@@")
  (pair._1, onestr)
collect().
foreach(println)
发布了1142 篇原创文章 · 获赞 196 · 访问量 260万+

猜你喜欢

转载自blog.csdn.net/guotong1988/article/details/104010337
今日推荐