Spark of driver understanding and comprehension executor

Figure 1. read a lot online, mostly diagram between dirver and executor, do not involve physical machine
in the following figure, I think these are always somewhat abstract
Here Insert Picture Description
see this chart, I would like to know where the driver program in ah, ghost know? For this reason I have done some research, most users say is have different ideas please comment
2. Now I have three computers are

192.168.10.82 –>bigdata01.hzjs.co 
192.168.10.83 –>bigdata02.hzjs.co 
192.168.10.84 –>bigdata03.hzjs.co 

slaves cluster configuration file as follows:

bigdata01.hzjs.co
bigdata02.hzjs.co
bigdata03.hzjs.co

So these three machines are worker nodes, the cluster is a fully distributed cluster tested, I use # ./start-all.sh, then which machine you run on which machine is the master node 7071 Master location process, I now use ./start-all.sh at 192.168.10.84

So will this
Here Insert Picture Description
3. Then we take a look at local mode
Here Insert Picture Description
Now suppose I performed bin on 192.168.10.84] # spark-shell it will produce a SparkContext at 192.168.10.84, this time 84 is the driver, other nodes woker (all three) is produced executor of the machine. Figure
Here Insert Picture Description
Now suppose I performed on the bin] # spark-shell it will produce in a SparkContext at 192.168.10.83 192.168.10.83, this time 83 is the driver, other woker node (all three) is produced executor of the machine. Figure
Here Insert Picture Description
Summary: Drivers driver in local mode is the implementation of a Spark Application of the main functions and create Spark Context of course, it contains all the code for this application. (Running on the machine all the application code is created sparkContext drive, to say that you can submit code to run that machine)

4.那么看看cluster模式下
Here Insert Picture Description
现在假设我在192.168.10.83上执行了 bin]# spark-shell 192.168.10.84:7077 那么就会在192.168.10.84
产生一个SparkContext,此时84就是driver,其他woker节点(三台都是)就是产生executor的机器。这
里直接指定了主节点driver是哪台机器:如图
Here Insert Picture Description
5. 如果driver有多个,那么按照上面的规则,去判断具体在哪里
Here Insert Picture Description
Driver:使用Driver这一概念的分布式框架有很多,比如hive,Spark中的Driver即运行Application的main()函数,并且创建SparkContext,创建SparkContext的目的是为了准了Spark应用程序的运行环境,在Spark中由SparkContext负责与ClusterManager通讯,进行资源的申请,任务的分配和监控等。当Executor部分运行完毕后,Driver同时负责将SaprkContext关闭,通常SparkContext代表Driver.
Here Insert Picture Description
上面红色框框都属于Driver,运行在Driver端,中间没有框住的部分属于Executor,运行的每个ExecutorBackend进程中。
println(pcase.count())collect方法是Spark中Action操作,负责job的触发,因为这里有个sc.runJob()方法

 def count(): Long = sc.runJob(this, Utils.getIteratorSize _).sum

hbaseRDD.map()属于Transformation操作。

Summary: Spark Application main method (in SparkContext related code) runs on Driver, when used to calculate the RDD trigger action Action will be submitted to Job, then the RDD will blame for every transformation operation forward until the initial RDD start, code running in between the Executor.

Transfer: the Spark learning -42-Spark understanding of driver understanding and executor

Guess you like

Origin blog.csdn.net/liweihope/article/details/91349902