1.Formally, an RDD is a read-only, partitioned collection of records. RDDs can be only created through deterministic operations on either (1) a dataset in stable storage or (2) other existing RDDs.
2.RDD是延迟加载的,就是说直到action被触发,才真正有动作。
3. RDD之间的关系分为narrow dependency 和 wide dependency,看图很好理解
4.spark的scheuler会把程序逻辑和RDD变成DAG图来,分stage执行