Spark源码分析之Spark Shell（上）

https://www.cnblogs.com/xing901022/p/6412619.html

文中分析的spark版本为apache的spark-2.1.0-bin-hadoop2.7。

bin目录结构：

-rwxr-xr-x. 1 bigdata bigdata 1089 Dec 15  2016 beeline
-rw-r--r--. 1 bigdata bigdata  899 Dec 15  2016 beeline.cmd
-rw-rw-r--. 1 bigdata bigdata  776 Sep 18 06:27 derby.log
-rwxr-xr-x. 1 bigdata bigdata 1933 Dec 15  2016 find-spark-home
-rw-r--r--. 1 bigdata bigdata 1909 Dec 15  2016 load-spark-env.cmd
-rw-r--r--. 1 bigdata bigdata 2133 Dec 15  2016 load-spark-env.sh
drwxrwxr-x. 5 bigdata bigdata 4096 Sep 18 06:27 metastore_db
-rwxr-xr-x. 1 bigdata bigdata 2989 Dec 15  2016 pyspark
-rw-r--r--. 1 bigdata bigdata 1493 Dec 15  2016 pyspark2.cmd
-rw-r--r--. 1 bigdata bigdata 1002 Dec 15  2016 pyspark.cmd
-rwxr-xr-x. 1 bigdata bigdata 1030 Dec 15  2016 run-example
-rw-r--r--. 1 bigdata bigdata  988 Dec 15  2016 run-example.cmd
-rwxr-xr-x. 1 bigdata bigdata 3116 Dec 15  2016 spark-class
-rw-r--r--. 1 bigdata bigdata 2236 Dec 15  2016 spark-class2.cmd
-rw-r--r--. 1 bigdata bigdata 1012 Dec 15  2016 spark-class.cmd
-rwxr-xr-x. 1 bigdata bigdata 1039 Dec 15  2016 sparkR
-rw-r--r--. 1 bigdata bigdata 1014 Dec 15  2016 sparkR2.cmd
-rw-r--r--. 1 bigdata bigdata 1000 Dec 15  2016 sparkR.cmd
-rwxr-xr-x. 1 bigdata bigdata 3017 Dec 15  2016 spark-shell
-rw-r--r--. 1 bigdata bigdata 1530 Dec 15  2016 spark-shell2.cmd
-rw-r--r--. 1 bigdata bigdata 1010 Dec 15  2016 spark-shell.cmd
-rwxr-xr-x. 1 bigdata bigdata 1065 Dec 15  2016 spark-sql
-rwxr-xr-x. 1 bigdata bigdata 1040 Dec 15  2016 spark-submit
-rw-r--r--. 1 bigdata bigdata 1128 Dec 15  2016 spark-submit2.cmd
-rw-r--r--. 1 bigdata bigdata 1012 Dec 15  2016 spark-submit.cmd

先来介绍一下Spark-shell是什么？

Spark-shell是提供给用户即时交互的一个命令窗口，你可以在里面编写spark代码，然后根据你的命令立即进行运算。这种东西也被叫做REPL,(Read-Eval-Print Loop)交互式开发环境。

先来粗略的看一眼，其实没有多少代码：

#!/usr/bin/env bash

# Shell script for starting the Spark Shell REPL

cygwin=false
case "$(uname)" in
  CYGWIN*) cygwin=true;;
esac

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]"

# SPARK-4161: scala does not assume use of the java classpath,
# so we need to add the "-Dscala.usejavacp=true" flag manually. We
# do this specifically for the Spark shell because the scala REPL
# has its own class loader, and any additional classpath specified
# through spark.driver.extraClassPath is not automatically propagated.
SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"

function main() {
  if $cygwin; then
    # Workaround for issue involving JLine and Cygwin
    # (see http://sourceforge.net/p/jline/bugs/40/).
    # If you're using the Mintty terminal emulator in Cygwin, may need to set the
    # "Backspace sends ^H" setting in "Keys" section of the Mintty options
    # (see https://github.com/sbt/sbt/issues/562).
    stty -icanon min 1 -echo > /dev/null 2>&1
    export SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Djline.terminal=unix"
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
    stty icanon echo > /dev/null 2>&1
  else
    export SPARK_SUBMIT_OPTS
    "${SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name "Spark shell" "$@"
  fi
}

# Copy restore-TTY-on-exit functions from Scala script so spark-shell exits properly even in
# binary distribution of Spark where Scala is not installed
exit_status=127
saved_stty=""

# restore stty settings (echo in particular)
function restoreSttySettings() {
  stty $saved_stty
  saved_stty=""
}

function onExit() {
  if [[ "$saved_stty" != "" ]]; then
    restoreSttySettings
  fi
  exit $exit_status
}

# to reenable echo if we are interrupted before completing.
trap onExit INT

# save terminal settings
saved_stty=$(stty -g 2>/dev/null)
# clear on error so we don't later try to restore them
if [[ ! $? ]]; then
  saved_stty=""
fi

main "$@"

# record the exit status lest it be overwritten:
# then reenable echo and propagate the code.
exit_status=$?
onExit

其实这个脚本只能看出来是调用了spark-submit，后续会再分析一下spark-submit的作用（它里面会调用spark-class，这才是执行方法的最终执行者，前面都是传参而已）。

最前面的

cygwin=false
case "$(uname)" in
  CYGWIN*) cygwin=true;;
esac

这个在很多的启动脚本中都可以看到，是检查你的系统是否属于cygwin。使用了uname命令，这个命令通常用于查询系统的名字或者内核版本号

uname可以查看操作系统的名字，详情参考 man uname.直接输入uname，一般显示Linux；使用uname -r 可以查看内核版本；使用uname -a 可以查看所有的信息

set -o posix

设置shell的模式为POSIX标准模式，不同的模式对于一些命令和操作不一样。Posix : Portable Operating System Interface of Unix它提供了操作系统的一套接口。

if [ -z "${SPARK_HOME}" ]; then
  source "$(dirname "$0")"/find-spark-home
fi

第一个if语句if [ -z "${SPARK_HOME}" ]; then用于检测是否设置过SPARK_HOME环境变量。

在shell里面条件表达式有非常多的用法,比如：

# 文件表达式
if [ -f  file ]    如果文件存在
if [ -d ...   ]    如果目录存在
if [ -s file  ]    如果文件存在且非空 
if [ -r file  ]    如果文件存在且可读
if [ -w file  ]    如果文件存在且可写
if [ -x file  ]    如果文件存在且可执行   

# 整数变量表达式
if [ int1 -eq int2 ]    如果int1等于int2   
if [ int1 -ne int2 ]    如果不等于    
if [ int1 -ge int2 ]    如果>=
if [ int1 -gt int2 ]    如果>
if [ int1 -le int2 ]    如果<=
if [ int1 -lt int2 ]    如果<
   

#    字符串变量表达式
If  [ $a = $b ]                 如果string1等于string2,字符串允许使用赋值号做等号
if  [ $string1 !=  $string2 ]   如果string1不等于string2       
if  [ -n $string  ]             如果string 非空(非0），返回0(true)  
if  [ -z $string  ]             如果string 为空
if  [ $sting ]                  如果string 非空，返回0 (和-n类似)

所以上面的那句判断，就是检查${SPARK_HOME}是否为空的意思。

source命令用于调用另一个脚本。

source "$(dirname "$0")"/find-spark-home

上面这句话整个的意思就是调用当前脚本所在目录中find-spark-home这个脚本。

我们具体分析一下：

首先$0是shell中的变量符号，类似的还有很多:

$# 是传给脚本的参数个数
$0 是脚本本身的名字
$1 是传递给该shell脚本的第一个参数
$2 是传递给该shell脚本的第二个参数
$@ 是传给脚本的所有参数的列表
$* 是以一个单字符串显示所有向脚本传递的参数，与位置变量不同，参数可超过9个
$$ 是脚本运行的当前进程ID号
$? 是显示最后命令的退出状态，0表示没有错误，其他表示有错误

最常用的应该是$0和$@。

在说说dirname命令，这个命令用于显示某个文件所在的路径。比如我有一个文件/home/xinghl/test/test1,在test目录中使用dirname test1，就会返回:

[root@localnode3 test]# pwd
/home/xinghl/test
[root@localnode3 test]# ll
总用量 4
-rw-r--r-- 1 root root 27 2月  17 10:48 test1
[root@localnode3 test]# dirname test1

我们要的其实就是那个点，在linux中.代表当前目录。..代表父目录。

SPARK_SUBMIT_OPTS="$SPARK_SUBMIT_OPTS -Dscala.usejavacp=true"

因为scala默认不会使用java classpath，因此这里需要手动设置一下，让scala使用java。

就先介绍到这吧.....后面再介绍下，spark-shell窗口的原理。

作者：xingoo

出处：http://www.cnblogs.com/xing901022

本文版权归作者和博客园共有。欢迎转载，但必须保留此段声明，且在文章页面明显位置给出原文连接！

Spark源码分析之Spark Shell（上）

猜你喜欢