aggregateByKey的使用 - 代码天地

aggregateByKey的使用

其他 2021-03-28 22:32:31 阅读次数: 0

说明

可以指定分区内和分区间不同逻辑的聚合操作。

函数签名

在这里插入图片描述

zeroValue：初始值，用于和RDD分区内的数据依次迭代。
seqOp:(U,V) => U：分区内的计算规则，初始值根据此规则依次与分区内的数据进行迭代。
combOp:(U,U) => U：分区间的计算规则，用于合并分区内的数据。

代码示例

    val conf: SparkConf = new SparkConf().setAppName(this.getClass.getName).setMaster("local[*]")
    val sc = new SparkContext(conf)
    val rdd: RDD[(String, Int)] = sc.makeRDD(List(("a", 3),  ("c", 6), ("c", 4),
      ("b", 3),("a", 2), ("c", 8)), 2)
    println("------------------分区内数据------------------")
    rdd.mapPartitionsWithIndex{
    
    
      case (index, datas) => {
    
    
        println(index + "--->" + datas.mkString(","))
        datas
      }
    }.collect()

    println("------------------分割线------------------")
    // 求每个分区每个key的最大值，然后最大值求和
    val resRDD: RDD[(String, Int)] = rdd.aggregateByKey(0)(Math.max(_, _), _ + _)
    resRDD.collect().foreach(println)
    sc.stop()

猜你喜欢

转载自blog.csdn.net/FlatTiger/article/details/115053368

aggregateByKey的使用

aggregateByKey的使用及案例

spark 使用aggregateByKey 代替groupbyKey

【Spark】spark使用aggregateByKey替代groupBeKey

aggregate，aggregateByKey

spark-aggregateByKey

Operator_AggregateByKey

SparkRDD之aggregateByKey

对spark算子aggregateByKey的理解

aggregat和aggregateByKey用法

spark aggregateByKey与aggregate

Spark——aggregateByKey 案例

Spark操作—aggregate、aggregateByKey详解

Spark算子之aggregateByKey详解

Spark core算子aggregateByKey实例

Spark编程：combineByKey与aggregateByKey异同

【Spark九十七】RDD API之aggregateByKey

spark-聚合算子aggregatebykey

【SparkAPI JAVA版】JavaPairRDD——aggregateByKey（二）

由aggregateByKey看到spark的性能调优

Spark高级算子：mapPartitionsWithIndex，aggregate，aggregateByKey

Spark中aggregateByKey算子详解介绍

Spark算子中aggregateByKey算子的理解【Java版纯代码】

Spark中的groupByKey,reduceByKey,combineBykey,和aggregateByKey的比较和区别

【spark】二 reduceByKey、reduceByKeyLocally、groupByKey、combineByKey、aggregateByKey 区别 [待补充]

(二)常用Shuffle类算子：groupByKey、reduceByKey、aggregateByKey 和 sortByKey

Spark部分：调优【reduceByKey/aggregateByKey替代groupByKey，mapPartitions替代普通map，foreachPartitions替代foreach】

Key-Value类型RDD转换算子1——partitionBy、groupByKey & reduceByKey、aggregateByKey & foldByKey & combineByKey

spark aggregateByKey 时 java.lang.OutOfMemoryError: GC overhead limit exceeded

Spark代码可读性与性能优化——示例六（groupBy、reduceByKey、aggregateByKey）

今日推荐

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

周排行

女程序员是这样被恶搞的

B/S 和 C/S 的优缺点

vector一直申请会怎样？

座头鲸识别比赛(Humpback Whale Identification)总结

Linux高性能服务器编程——I/O复用 select

Mysql连接数据库（当包使用）

通过URI获取的文件路径为null的解决方法

1022-Primes on Interval(素数筛选+二分查找) ZCMU

Python出现： TypeError: expected string or buffer

bzoj2434: [Noi2011]阿狸的打字机 ac自动机+树状数组

每日归档

更多

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)