Spark源码分析1-部署与整体架构

Spark官网http://spark.apache.org/docs/latest/

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in ScalaJava, and Python that make parallel jobs easy to write, and an optimized engine that supports general computation graphs. It also supports a rich set of higher-level tools includingShark (Hive on Spark), MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

 

Spark 部署

1.下载:http://spark.apache.org/downloads.html.

2. 编译:sbt/sbt assembly

3.启动master:./sbin/start-master.sh

 启动work:./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT

4.spark的部署方式

1)local 模式,将master设置成“local”,这样spark将以本地模式运行,不用启动work和master,适合于调试

2)deriver模式:

./bin/spark-class org.apache.spark.deploy.Client --supervise  --verbose launch  spark://hzs-sparc01:7077 file:///home/share/lib/OperationIntelligence-0.0.1-SNAPSHOT.jar com.seven.oi.spark.RemoteSparkApplication

master将在work上启动一个deriver用来管理excutor的运行

deriver模式的优点是master可以管理deriver,当deriver挂掉后,可以重新启动deriver。

3)app模式:将app打成jar包,调用jar命令运行这个jar包

java -cp OperationIntelligence-0.0.1-SNAPSHOT.jar com.seven.oi.spark.RemoteSparkApplication

类似于deriver模式,但master不会管理这个deriver,java命令启动的进程将作为一个deriver

4)Mesos 模式:未看

5)YARN 模式:未看

 

Spark组成

spark由deriver,worker,excutor,master组成。deriver用于register application,schedual job,collector block manager info, worker用于start excutor 和deriver,excutor 用于run task,master用于维护application和worker的状态并且restart deriver 和application。

 

下面是deriver启动到runJob的过程

 

 

猜你喜欢

转载自frankfan915.iteye.com/blog/2061961