Detailed introduction to the aggregate operator in Spark

insert image description here


1. Detailed introduction to the aggregate operator in Spark

In the previous article, we explained how to use aggregateByKeyto aggregate the values ​​of the same key within and between partitions. This article will use another operator aggregateto aggregate data.

  • aggregate applies to the overall aggregation operation regardless of the key, it will aggregate on the entire RDD.
  • aggregateByKey is suitable for aggregating RDD by key. It will group by key, perform local aggregation in the group, and then aggregate the same key in different partitions globally.

1. Function introduction

In Spark, aggregateit is an advanced transformation operator (Transformation Operator) for aggregation operations. It can aggregate elements in RDD while

Guess you like

Origin blog.csdn.net/m0_47256162/article/details/132348135