mapPartitions processes the data of one partition at a time. Only after the data of the current partition is processed, the data in the original RDD partition will be released, which may lead to OOM.
mapPartitionsWithIndex processes the data of one partition at a time, the same as mapPartitions, but the difference is that mapPartitionsWithIndex has the original RDD partition number. This operator can be used when we want to process only the data of a certain partition.
scenes to be used
mapPartitons is suitable for use when the space memory is large or when the database is frequently connected to improve processing efficiency.
Map is suitable for situations where the memory is small.
mapPartitionsWithIndex is the same as mapPartitions, but it is more convenient to manipulate the data of the specified partition.