Article Directory
1. Detailed introduction to the aggregate operator in Spark
In the previous article, we explained how to use aggregateByKey
to aggregate the values of the same key within and between partitions. This article will use another operator aggregate
to aggregate data.
- aggregate applies to the overall aggregation operation regardless of the key, it will aggregate on the entire RDD.
- aggregateByKey is suitable for aggregating RDD by key. It will group by key, perform local aggregation in the group, and then aggregate the same key in different partitions globally.
1. Function introduction
In Spark, aggregate
it is an advanced transformation operator (Transformation Operator) for aggregation operations. It can aggregate elements in RDD while