Introduction to spark-submit and spark-shell in spark

spark-submit :
1. It is mainly used to submit the compiled and packaged Jar package to the cluster environment to run. It is similar to the hadoop jar command in hadoop. The hadoop jar submits a MR-task, while spark-submit is Submit a spark task, this script** can set the Spark classpath (classpath) and application dependency packages, and can set different cluster management and deployment modes supported by Spark. **Compared to spark-shell, it does not have REPL (interactive programming environment). Before running, you need to specify the application startup class, jar package path, parameters, etc.

2. Basic grammar

bin/spark-submit \
--class <main-class>
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]

Explanation of the corresponding parameters:

--master 指定Master的地址;
--class: 你的应用的启动类 (如 org.apache.spark.examples.SparkPi)--deploy-mode: 是否发布你的驱动到worker节点(cluster) 或者作为一个本地客户端 (client) (default: client)--conf: 任意的Spark配置属性, 格式key=value. 如果值包含空格,可以加引号“key=value” ;
application-jar: 打包好的应用jar,包含依赖. 这个URL在集群中全局可见。 比如hdfs:// 共享存储系统, 如果是 file:// path, 那么所有的节点的path都包含同样的jar
application-arguments: 传给main()方法的参数;
--executor-memory 1G 指定每个executor可用内存为1G;
--total-executor-cores 2 指定每个executor使用的cup核数为2个。

spark-shell :
1. This is the command line method to execute the spark program. This command is to start a shell that can input spark commands to the user, which is a REPL environment. If the master is not specified, a SparkSubmit process will be started to simulate Spark running. surroundings.

Insert picture description here

Insert picture description here

When the master is specified, the task will be sent to the spark cluster (provided there is a set of spark Standalone mode cluster)

Insert picture description here

2. By viewing the spark-shell source code, you can find that in the main main, he will call the entry of the spark-submit class. All in all, after running spark-shell to submit the task, spark-submit will finally complete it.

function main() {
    
    
  if $cygwin; then
    # Workaround for issue involving JLine and Cygwin
    # (see http://sourceforge.net/p/jline/bugs/40/).
    # If you're using the Mintty terminal emulator in Cygwin, may need to set the
    # "Backspace sends ^H" setting in "Keys" section of the Mintty options
    # (see https://github.com/sbt/sbt/issues/562).
    stty -icanon min 1 -echo > /dev/null 2>&1
    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
    stty icanon echo > /dev/null 2>&1
  else
    export SPARK_SUBMIT_OPTS
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
  fi
}

Guess you like

Origin blog.csdn.net/weixin_44080445/article/details/109674537