spark-submit :
1. It is mainly used to submit the compiled and packaged Jar package to the cluster environment to run. It is similar to the hadoop jar command in hadoop. The hadoop jar submits a MR-task, while spark-submit is Submit a spark task, this script** can set the Spark classpath (classpath) and application dependency packages, and can set different cluster management and deployment modes supported by Spark. **Compared to spark-shell, it does not have REPL (interactive programming environment). Before running, you need to specify the application startup class, jar package path, parameters, etc.
2. Basic grammar
bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
Explanation of the corresponding parameters:
--master 指定Master的地址;
--class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi);
--deploy-mode: 是否发布你的驱动到worker节点(cluster) 或者作为一个本地客户端 (client) (default: client);
--conf: 任意的Spark配置属性, 格式key=value. 如果值包含空格,可以加引号“key=value” ;
application-jar: 打包好的应用jar,包含依赖. 这个URL在集群中全局可见。 比如hdfs:// 共享存储系统, 如果是 file:// path, 那么所有的节点的path都包含同样的jar
application-arguments: 传给main()方法的参数;
--executor-memory 1G 指定每个executor可用内存为1G;
--total-executor-cores 2 指定每个executor使用的cup核数为2个。
spark-shell :
1. This is the command line method to execute the spark program. This command is to start a shell that can input spark commands to the user, which is a REPL environment. If the master is not specified, a SparkSubmit process will be started to simulate Spark running. surroundings.
When the master is specified, the task will be sent to the spark cluster (provided there is a set of spark Standalone mode cluster)
2. By viewing the spark-shell source code, you can find that in the main main, he will call the entry of the spark-submit class. All in all, after running spark-shell to submit the task, spark-submit will finally complete it.
function main() {
if $cygwin; then
# Workaround for issue involving JLine and Cygwin
# (see http://sourceforge.net/p/jline/bugs/40/).
# If you're using the Mintty terminal emulator in Cygwin, may need to set the
# "Backspace sends ^H" setting in "Keys" section of the Mintty options
# (see https://github.com/sbt/sbt/issues/562).
stty -icanon min 1 -echo > /dev/null 2>&1
export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
stty icanon echo > /dev/null 2>&1
else
export SPARK_SUBMIT_OPTS
"${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
fi
}