spark foreachPartition foreach

1.foreach

    val list = new ArrayBuffer()
    myRdd.foreach(record => {
      list += record
    })

2.foreachPartition

    val list = new ArrayBuffer
    rdd.foreachPartition(it => {
      It.foreach(r => {
        list += r
      })
    })

Description:

foreachPartition belongs to operator operation, the efficiency of the model can be improved. For example, when using foreach, all data will be written RDD Mongo, the data will be a piece of data write, each function call might create a database connection, then it is bound to create and destroy a database connection frequently, performance It is very low; but if the operator foreachPartitions disposable processing a data partition, then for each partition, as long as the connection to create a database, and perform bulk insert operation, performance is relatively high at this time.

Refer to the official website explained:

https://spark.apache.org/docs/latest/streaming-programming-guide.html

 

Guess you like

Origin www.cnblogs.com/shaozhiqi/p/11599748.html