spark缓存

Spark还支持将数据集提取到群集范围的内存缓存中。这在重复访问数据时非常有用,例如查询小的“热”数据集或运行像PageRank这样的迭代算法时。举个简单的例子,让我们标记linesWithSpark要缓存的数据集:

scala> val textFile = spark.read.textFile("README.md")
val linesWithSpark = textFile.filter(line => line.contains("Spark"))

scala> linesWithSpark.cache()
res7: linesWithSpark.type = [value: string]

scala> linesWithSpark.count()
res8: Long = 15

scala> linesWithSpark.count()
res9: Long = 15

猜你喜欢

转载自blog.csdn.net/weixin_42201566/article/details/85699740