Spark-on-YARN运行Spark程序

1.官方文档

http://spark.apache.org/docs/latest/running-on-yarn.html

2.配置安装

1.安装hadoop：需要安装HDFS模块和YARN模块，HDFS必须安装，spark运行时要把jar包存放到HDFS上。
2.安装Spark：解压Spark安装程序到一台服务器上，修改spark-env.sh配置文件，spark程序将作为YARN的客户端用于提交任务
export JAVA_HOME=/usr/local/jdk1.8.0_131
export HADOOP_CONF_DIR=/usr/local/hadoop-2.7.3/etc/hadoop
3.启动HDFS和YARN

3.运行模式（cluster模式和client模式）

1.cluster模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--queue default \
lib/spark-examples*.jar \
10

./bin/spark-submit --class cn.edu360.spark.day1.WordCount \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--queue default \
/home/bigdata/hello-spark-1.0.jar \
hdfs://node-1.edu360.cn:9000/wc hdfs://node-1.edu360.cn:9000/out-yarn-1

2.client模式

./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode client \
--driver-memory 1g \
--executor-memory 1g \
--executor-cores 2 \
--queue default \
lib/spark-examples*.jar \
10

spark-shell必须使用client模式

./bin/spark-shell --master yarn --deploy-mode client

3.两种模式的区别

cluster模式：Driver程序在YARN中运行，应用的运行结果不能在客户端显示，所以最好运行那些将结果最终保存在外部存储介质（如HDFS、Redis、Mysql）而非stdout输出的应用程序，客户端的终端显示的仅是作为YARN的job的简单运行状况。

client模式：Driver运行在Client上，应用程序运行结果会在客户端显示，所有适合运行结果有输出的应用程序（如spark-shell）

上等猪头肉

发布了48 篇原创文章 · 获赞 7 · 访问量 1万+

私信关注

Spark-on-YARN运行Spark程序

1.官方文档

2.配置安装

3.运行模式（cluster模式和client模式）

3.两种模式的区别

猜你喜欢