Westward Spark (6) - SparkContext principle source of analysis

SparkContext spark the entire program only channel leading to the cluster, he is the starting point of the program, it is the end of the program.
Each of our programs require a spark to create SparkContext, then call the method SparkContext, for example sc.textFile (filepath), the program would eventually call sc.stop () to exit.
Let's look together SparkContext which in the end is how to achieve it!

Three core objects within SparkContext 1: DAGScheduler, TaskScheduler, SchedulerBackend

DAGScheduler : Stage scheduling mechanism for high-level scheduler calculates a DAG for each of Stage Job (directed acyclic graph). Track RDD and Stage output if materialized (write to disk or memory), and the implementation of an optimal scheduling mechanism to perform. The stage TaskScheduler submitted as tasksets to the bottom and run on the cluster. DAGScheduler monitor the process of scheduling jobs to run, run if a phase fails, it will resubmit submit the scheduling phase.

The TaskScheduler : is an interface, the underlying scheduler. Will have different implementations depending ClusterManager realized in Standalone mode is TaskSchedulerImpl. Receiving DAGScheduler sent me the task set, and distributed in the form of tasks to worker nodes in the cluster Executor to run, the task failed TaskScheduler responsible and try again. If TaskScheduler find a task has not finished running the same task may start to run a task, select the task result of early runs out of (speculative execution).

SchedulerBackend : is an interface, depending ClusterManager have different implementations, the Standalone mode StandaloneSchedulerBackend (2.3 version, version 1.x SparkDeploySchedulerBackend) TaskSchedulerImpl bottom acceptance control, the actual Master is responsible for registration and other operations Tasks sent to Executor .

2.1 illustrates examples of process SparkContext

As shown below, we look SparkContext instantiation process, will create the core of how many instances to complete the registration of the entire application.

2.2 Timing Diagram

3 describes the main content

  • createTaskScheduler
  • createSchedulerBackend
  • initialize the default initialization configuration SchedulerBackend FIFO scheduling eat
  • new DAGScheduler
  • Creating StandaloneAppClient cluster communication and spark
  • Creating AppClient, ClientEndPoint (registered with the master)
  • Message RegisterApplication
  • ClientEndpoint.receive () function receives a reply message master

See Example 4 SparkContext process (Standalone mode) Source

scala not in the method where the members will be instantiated, the most critical way to start is createTaskScheduler, which is located SparkContext the constructor, when it is instantiated directly be called.

createTaskScheduler TaskSchedulerImpl created and initialized by StandaloneSchedulerBackend them.

createTaskScheduler return scheduleBackend and TaskScheduler, and then based on TaskScheduler construction DAGScheduler.

  • SparkContext调用createTaskScheduler方法,返回SchedulerBackend和TaskScheduler。

  • 下createTaskScheduler方法内部:根据不同的master url创建不同的TaskScheduler实现和不同的SchedulerBackend实现。 master url就是创建SparkContext的时候传的,例如下面的local

    val conf = new SparkConf().setAppName("TestApp").setMaster("local")
    val sc = new SparkContext(conf)
    

  • taskSchedulerImpl的初始化方法,创建一个默认FIFO的调度池:

  • taskSchedulerImpl初始化后,随即为其设置DAGScheduler,然后调用其start()方法:

  • 在taskSchedulerImpl的start()方法中再调用backend(StandaloneSchedulerBackend)的start()方法,其中最重要的就是创建ApplicationDescription和AppClient

  • 创建ApplicationDescription和AppClient

  • ApplicationDescription存放当前应用程序信息,name,cores,memory等。

  • AppClient是Application与Spark通信的组件。在appClient.start()的时候会创建内部类ClientEndPoint

  • clientEndPoint注册master。

  • 注册的时候会从线程池中拿出一个线程并且会带上APPDescription中的作业信息。

  • ClientEndpoint.receive接收master返回的消息,根据不同的返回消息做不同的操作。

  • SparkContext.DAGScheduler

  • 创建SparkUI

以上就是SparkContext源码的构造过程,感谢阅读。
End。

Guess you like

Origin www.cnblogs.com/wangtcc/p/da-huaSpark-6yuan-ma-zhiSparkContext-yuan-li-pou-x.html