Endurance of Sanko
Others
2020-04-08 23:27:01
views: null
cache
- cache()=persist()=persist(StroageLevel.MEMROY_ONLY)
persist can manually specify the level of persistence
- persist(StorageLevel.MEMORY_ONLY)
- MEMORY_ONLY_SER
- MEMORY_AND_DISK
- MEMORY_AND_DISK_SER
- note:
- Try to avoid using DISK_ONLY level
- Try to avoid using "_2" level
Note the use of cache and persist matters:
- cache and persist minimum unit partition, it is lazy execution, operator action required to trigger execution
- After a use of the RDD cache or persist, can be assigned to a variable, the variable is directly next use of persistent data
- Operators can not keep up action after the cache and persist
- When the application is executed after completion of persistent data will be cleared
checkpoint
- Data can be persisted to disk, it can also cut the dependency between the RDD
- When the lineage is very long and complex calculation, you can use the checkpoint to RDD for persistence, when the application is finished
- The checkpoint data will not be cleared
- checkpoint implementation process
- After the action is triggered when the application has to perform, job finished 3 will move forward from the back
- What is checkpoint marks RDD do have to go back
- After completion of back recalculated checkpoint'RDD data, the result is written in the specified directory checkpoint
- Cut dependence of RDD
- Optimization: prior to RDDcheckpoint, a good idea to lower cache
Published 39 original articles
·
won praise 13
·
views 2301
Origin blog.csdn.net/qq_43205282/article/details/103987005