Spark failed to read mongodb, reporting executor time out and GC overhead limit exceeded exceptions

Code:

import com.mongodb.spark.config.ReadConfig
import com.mongodb.spark.sql._
val config = sqlContext.sparkContext.getConf
.set("spark.mongodb.keep_alive_ms", "15000")
.set("spark.mongodb.input.uri", "mongodb://10.100.12.14:27017")
.set("spark.mongodb.input.database", "bi")
.set("spark.mongodb.input.collection", "userGroupMapping")
val readConfig = ReadConfig (config)
select objUserGroupMapping = sqlContext.read
.format("com.mongodb.spark.sql")
.mongo (readConfig)
objUserGroupMapping.printSchema()
val tbUserGroupMapping = objUserGroupMapping.toDF ()
tbUserGroupMapping.registerTempTable("userGroupMapping")

select _id,c,g,n,rn,t,ut from userGroupMapping where ut>'2018-05-02' limit 100

Using the above code to fetch 100 records after the userGroupMapping collection, executor time out and GC overhead limit exceeded exceptions occurred. At first, I thought that the data fetched by the task from mongodb was too large, resulting in insufficient memory of spark executor. Later, I investigated that spark mongodb connector sends data under conditions when fetching data, that is, it filters from mongodb first and then retrieves spark memory. In this way, there will be no shortage of memory. Later, after researching on the Internet, there is a saying that there are too many tasks, which leads to the competition for gc time and memory resources during task gc (this is not very clear). According to this view, I changed the original task core from 16 to 6. After running the program, it will not report an error. As for the specific reason is not very clear, first record here.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325251740&siteId=291194637