Spark kernel analysis (4) SparkContext principle analysis

In Spark, SparkContext is responsible for communicating with the cluster, applying for resources, and assigning and monitoring tasks. After the Executor in the Worker node finishes running the Task, the Driver is also responsible for shutting down the SparkContext.通常也可以使用 SparkContext 来代表驱动程序(Driver)。

The following figure shows the flow chart of the interaction between SparkContext and the cluster:
Insert picture description here
SparkContext 是用户通往 Spark 集群的唯一入口,可以用来在 Spark 集群中创建 RDD、累加器和广播变量。SparkContext is also a vital object in the entire Spark application, and it can be said to be the core of the entire Application operation scheduling (excluding resource scheduling).

SparkContext 的核心作用是初始化 Spark 应用程序运行所需的核心组件, Including high-level scheduling (DAGScheduler), low-level scheduler (TaskScheduler) and scheduler's communication terminal (SchedulerBackend), and also responsible for the registration of Spark programs to Cluster Manager.
Insert picture description here
In the actual coding process, we will first create a SparkConf instance, and customize the properties of SparkConf, and then pass in SparkConf as the only construction parameter of the SparkContext class to complete the creation of the SparkContext instance object.

SparkContext initializes DAGScheduler, TaskScheduler, and SchedulerBackend during the instantiation process. When the action operator of RDD triggers the job, SparkContext will call DAGScheduler to divide the job into several small stages according to the width and narrow dependencies. TaskScheduler Tasks of each Stage will be scheduled. In addition, SchedulerBackend is responsible for applying and managing the computing resources (ie Executor) allocated by the cluster for the current Application.

如果我们将 Spark Application 比作汽车,那么 SparkContext 就是汽车的引擎,而 SparkConf 就是引擎的配置参数。

The following figure describes the interaction process of ApplicationMaster, Driver, and Executor internal modules during task scheduling in Spark-On-Yarn mode:
Insert picture description here
During the process of Driver initializing SparkContext, DAGScheduler, TaskScheduler, SchedulerBackend and HeartbeatReceiver are initialized respectively, and SchedulerBackend and HeartbeatReceiver are started. SchedulerBackend 通过 ApplicationMaster 申请资源,并不断从 TaskScheduler 中拿到合适的 Task 分发到 Executor 执行。HeartbeatReceiver is responsible for receiving the heartbeat information of Executor, monitoring the survival status of Executor, and notifying TaskScheduler.

Guess you like

Origin blog.csdn.net/weixin_43520450/article/details/108607599