Spark's broad and narrow dependency division rules

Narrow dependence

  • The data in a partition of the parent RDD is handed over to a partition of the child RDD for processing.

Wide dependence

  • The data in one partition of the parent RDD is handed over to multiple partitions of the child RDD for processing.

How to distinguish between wide dependence and narrow dependence?

  • Generally speaking, the operations in which shuffle occurs are wide dependent. For example: sortBy(), reduceByKey(), groupByKey(), join() and any operation that calls the rePartition() function.

Guess you like

Origin blog.csdn.net/FlatTiger/article/details/115079759