spark delay scheduling and dynamic resource management

Delay in scheduling Spark

Task scheduling process of the Spark has five level of locality: PROCESS_NODE, NODE_LOCAL, NO_PREF, RACK_LOCAL, ANY. In an ideal state, we certainly want all the Task scheduling is PROCESS_NODE level, so all of the Task are calculated on the machine where the data resides, such computational efficiency is the best. But the actual situation is not the case, because the Spark will occur in scheduling Task two is as follows:
1, Executor on the machine where the data is not idle cpu resources
2, and there is no data on the machine containing the Executor idle cpu resources where
Occur when both cases, we may need to discard the best data locality, and this is the delay scheduling needs to be done, for the first case, the delay scheduling, etc. Task let some time (default is configured spark .locality.wait = 3s), expected to release on the cpu Executor on the machine where the data resides, and then dispatched to this Executor, the Executor that if the cpu has not been released (ie, Task, etc. over a certain period of time) it, this time directly to the Spark Task Executor dispatched to a vacant cpu resources, but there is no data, although this time the Task locality level is not high, but you may wait more than Executor contains data release idle cpu to the more efficient

 

Spark Dynamic Resource Management

Executor decided according to the number of cluster computing resources needed dynamic, in the course of a Spark application execution, if one does not have any Task Executor execution, if after this idle state to maintain a certain amount of time (default is 60s), then this Executor will be deleted from the cluster. When Driver Task pending end have to wait a certain period of time (default is 1 second) has not been scheduled for execution, then this will increase the Executor application, such decisions require dynamic Executor can make more efficient use of resources across the cluster, more will for use in spark-shell and Spark SQL in the ThriftServer

Guess you like

Origin www.cnblogs.com/tesla-turing/p/11959032.html