Five attributes of spark RDD

1. The five attributes of RDD

  • A list of partitions
  • A function for computing each split
  • A list of dependencies on other RDDs
  • Optionally, a Partitioner for key-value RDDs (eg to say that the RDD is hash-partitioned) Optional: for key, value pair RDD, there is a partition function
  • Optionally, a list of preferred locations to compute each split on (eg block locations for an HDFS file) Mobile computing is cheaper than mobile data. If the file is on which server, start the task on which server to perform the calculation, and try to avoid data copying

Guess you like

Origin blog.csdn.net/weixin_44429965/article/details/107356541