Spark-Cache与Checkpoint

一、Cache缓存操作

scala> val rdd1 = sc.textFile("hdfs://192.168.146.111:9000/logs")
rdd1: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[38] at textFile at <console>:24

scala> rdd1.count
res13: Long = 40155                                                             

scala> rdd1.count
res14: Long = 40155

scala> val rdd2 = sc.textFile("hdfs://192.168.146.111:9000/logs") rdd2: org.apache.spark.rdd.RDD[String] = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> val rdd2Cache = rdd2.cache rdd2Cache: rdd2.type = hdfs://192.168.146.111:9000/logs MapPartitionsRDD[40] at textFile at <console>:24 scala> rdd2Cache.count res15: Long = 40155 scala> rdd2Cache.count res16: Long = 40155 scala> rdd2Cache.count res17: Long = 40155

二、Checpoint机制

scala> sc.setCheckpointDir("hdfs://192.168.146.111:9000/chechdir")

scala> val rddc = rdd1.filter(_.contains("bigdata"))
rddc: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[41] at filter at <console>:26

scala> rddc.checkpoint

scala> rddc.count
res21: Long = 7155 

猜你喜欢

转载自www.cnblogs.com/areyouready/p/10293756.html