1.foreach
val list = new ArrayBuffer() myRdd.foreach(record => { list += record })
2.foreachPartition
val list = new ArrayBuffer rdd.foreachPartition(it => { It.foreach(r => { list += r }) })
Description:
foreachPartition belongs to operator operation, the efficiency of the model can be improved. For example, when using foreach, all data will be written RDD Mongo, the data will be a piece of data write, each function call might create a database connection, then it is bound to create and destroy a database connection frequently, performance It is very low; but if the operator foreachPartitions disposable processing a data partition, then for each partition, as long as the connection to create a database, and perform bulk insert operation, performance is relatively high at this time.
Refer to the official website explained:
https://spark.apache.org/docs/latest/streaming-programming-guide.html