To implement an RDD by yourself, what functions or parts need to be implemented?

  RDD consists of the following main parts:

  partitions --- partition collection, how many data partitions are in an RDD.

  dependencies --- RDD dependencies, that is, a list of dependencies on other RDDs.

  compute(partition) --- For a given data set, which calculations need to be done, the same calculation function is used for the calculation function of each partition, that is, the data of each slice of the same RDD.

  perferredLocations --- Location preferences for the data partiton.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325344128&siteId=291194637