Spark parameter tuning

Refer to
https://www.zybuluo.com/xiaop1987/note/102894

In a Spark application, each Spark executor has a fixed number of cores and a fixed heap size. The number of cores can be specified by the parameter --executor-cores when executing spark-submit or pyspark or spark-shell, or set the spark.executor.cores parameter in the spark-defaults.conf configuration file or SparkConf object. Likewise, the size of the heap can be configured via the --executor-memory parameter or the spark.executor.memory option. The core configuration item controls the number of concurrent tasks in an executor. --executor-cores 5 means that each executor can have up to 5 tasks running at the same time. The memory parameter affects the size of the data that Spark can cache, that is, the maximum value of the data structure shuffled during group aggregate and join operations.
The --num-executors command line parameter or the spark.executor.instances configuration item controls the number of executors required. As of CDH 5.4/Spark 1.3, you can avoid using this parameter as long as you turn on dynamic allocation by setting the spark.dynamicAllocation.enabled parameter. Dynamic allocation enables Spark applications to request executors when there is a subsequent backlog of pending tasks, and release these executors when idle.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326695757&siteId=291194637