单节点伪分布式Spark-on-Yarn的搭建

强烈推荐这篇文章, 非常详细: 
https://blog.csdn.net/chengyuqiang/article/details/77864246


启动语句

spark-shell --master yarn --deploy-mode client


文中提到运行后一个错误: ERROR spark.SparkContext: Error initializing SparkContext.

原因是YARN的分配给container的内存不够, 解决办法修改YARN的配置文件yarn-site.xml :

<property>        
	<name>yarn.nodemanager.vmem-check-enabled</name>        
	<value>false</value>        
	<description>Whether virtual memory limits will be enforced for containers</description>    
</property>    
<property>        
	<name>yarn.nodemanager.vmem-pmem-ratio</name>        
	<value>4</value>        
	<description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>    
</property>

转者注:

1. 启动hadoop后等几分钟再启动Spark-on-yarn, 否则namenode还处于safe mode, 也可能导致错误

2. 单节点的集群, 需要额外配置一下spark-evn.sh, 加上红色的一行

HADOOP_CONF_DIR=/home/rav009/hadoop-2.8.0/etc/hadoop/
SPARK_LOCAL_IP="127.0.0.1"

猜你喜欢

转载自blog.csdn.net/rav009/article/details/80842351