Spark overall architecture

spark architecture

1. Execution process
Spark-submit submits the application, the
driver first constructs sparkconf, then sparkcontext,
sparkcontext constructs DAGScheuler and TaskScheduler,
TaskScheduler connects to the master through its background process (ClientActor), and registers the application to the master. After the master
receives the registration request of the application, Use your own resource scheduling algorithm to start multiple executors for this application on the worker of the spark cluster.
After the executors are started, they will be reverse-registered to the TaskScheduler (DriverActor). After
all executors are reverse-registered to the driver, the driver ends the sparkcontext initialization. Execute code
Every time an action is executed, a job is created and submitted to DagScheduler.
DagScheduler divides the job into multiple stages, and then creates a taskset for each stage.
TaskScheduler submits each task of the taskset to the executor and executes it
every time the Executor receives it. To a task, taskrunner is used to encapsulate the task, and then a thread is taken from the thread pool to execute (taskrunner copies the code we write, that is, the operator and function that executes, copies, deserializes, and then executes the task)
Task has two Kind, ShuffleMapTask and ResultTask, only the last stage is ResultTask, the previous stage is ShuffleMapTask
The execution of the entire spark application is that the stage is submitted to the executor as taskset in batches. Each task executes the operators and functions we define for a partition of the RDD, and so on, until all operations are executed.

2. Spark yarn submits
yarn-cluster:
Spark-submit sends a request to resourcemanager, requesting to start applicationMaster
Resoucemanager assigns container to start applicationMaster on a nodemanager
applicationMaster is equivalent to driver
AM looking for RM, requesting container, starting executor
AM to connect to other NMs to start executor, (NM is equivalent to worker)
After the executor is started, the
yarn-client is registered with the AM reversely : the
driver is started locally

Guess you like

Origin blog.csdn.net/m0_46449152/article/details/114356977