Deep Analysis of Spark Kernel Architecture

Deep Analysis of Spark Kernel Architecture

The spark kernel mainly includes the following components:
1. Application
2, spark-submit
3, Driver
4, SparkContext
5, Master
6, Worker
7, Executor
8, Job
9, DAGScheduler
10, TaskScheduler
11, ShuffleMapTask and ResultTask
The process of spark application execution As shown in the following figure:
write picture description here
Understanding of the above concepts:

  • Application: It is an application, a simple understanding is the spark program you wrote
  • spark-submit: It is a program used to submit applications to the Spark cluster. Spark-submit, in fact, is an actor model that inherits AKKA Actor. If it does not inherit, it will not be able to communicate with our master, and it will not be able to communicate with the master. Register our application
  • Driver: It is a machine we use to submit the Spark program we wrote. The most important thing in Driver is to create SparkContext
  • SparkContext: In the process of creating SparkContext, the three most important things are: one is to create DAGSechedule (directed acyclic graph scheduler), the other is to create TaskScheduler (task scheduler), and the third is to create TaskSchedulerBackend (task scheduler) according to TaskScheduler rear end)
  • The Master is mainly used for cluster monitoring and allocation of running resources. When the Master allocates resources, there are two allocation methods, one is spreadapps and the other is non-spreadapps. The Master is actually an Actor model of an AKKA Actor that is sent by the Driver. Register the notification, then measure the task, what resources are needed, and hand it over to the worker to do the work. In fact, to put it bluntly, let the worker start the executor process
  • Worker: It is a process. Except for the Master node, each node has a Worker process. The Master node can be configured without configuring the Worker process. it is the labor of spark execution
  • Executor: The process created in the worker node, allocates resources for each task and executes the task
  • Job: Each action operation will form a Job
  • DAGScheduler: DAG: Directed acyclic graph (Directed acyclic graph) After the program is created, various operators will be handed over to DAGScheduler for an overall scheduling. When each of our applications is running, it will be divided into several by DAGScheduler. Each stage is done by the relevant partitioning algorithm, and then a Taskset is created for each stage. When DAGScheduler receives the task information, it will assign the relevant TaskScheduler to perform specific scheduling of the task, and let a batch of tasks in our taskset perform specific tasks.
  • TaskScheduler: TaskScheduler will organize and schedule tasks for task execution
  • When the executor in the worker is started, it will actively register to the driver in reverse. When the driver receives the reverse registration information of all executors (a group of executors), it starts to load data to create an RDD, and hand over various operators to DAGScheduler for management. [So the question is, how does the driver know that it has received all executors in a group? Do you still remember that after the Master receives the driver's registration request, it assigns tasks and notifies each worker to receive tasks. After the workers are combined It will respond to the master when the task is received, the master will tell the driver that the worker has received the task, and the master will assign the task plan to the DRIVER at this moment, and the driver can know whether the Executor in a group is based on this assignment plan. have all arrived]
  • taskRunner: When our task is allocated, the executor will extract the corresponding task from the thread pool, encapsulate it as a taskRunner, and perform specific operations such as flatmap, map, reduceByKey, etc.
  • In fact, there are two types of tasks, ShuffleMapTask, ResultTask, and ResultTask, in short, the task that executes the action, and the rest are ShuffleMapTask

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324817491&siteId=291194637