Big Data learning day19 ----- spark02 -------

1. RDD use

1.1 What is RDD

  RDD (Resilient Distributed Dataset) is an abstract data set, the data set stored RDD not to be calculated, is stored in the metadata, i.e., arithmetic logic and data descriptive information, such as where to read data from, how operation and the like. RDD can be understood as an agent, you operate the RDD, the description information corresponding to the calculated first recording end at Driver, and then generates a Task, the Task schedule to Executor terminal performs only real computational logic

 

1.2 RDD features

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/jj1106/p/11965439.html