The data in a partition of the parent RDD is handed over to a partition of the child RDD for processing.
Wide dependence
The data in one partition of the parent RDD is handed over to multiple partitions of the child RDD for processing.
How to distinguish between wide dependence and narrow dependence?
Generally speaking, the operations in which shuffle occurs are wide dependent. For example: sortBy(), reduceByKey(), groupByKey(), join() and any operation that calls the rePartition() function.