Spark整合HDFS、WordCount示例

原创转载请注明出处:http://agilestyle.iteye.com/blog/2294233

前提条件

Hadoop HA搭建完毕

Spark HA搭建完毕

整合步骤

cd到spark的conf的目录,修改spark-env.sh

添加如下

export HADOOP_CONF_DIR=/home/hadoop/app/hadoop-2.6.4/etc/hadoop


 

保存退出,将spark-env.sh分发到其他两个节点

scp spark-env.sh hadoop-0000:/home/hadoop/app/spark-1.6.1-bin-hadoop2.6/conf
scp spark-env.sh hadoop-0001:/home/hadoop/app/spark-1.6.1-bin-hadoop2.6/conf

启动

首先启动Hadoop HA

http://hadoop-0000:50070 —— active


http://hadoop-0001:50070 —— standby


 

接着启动Spark HA(这里选择是hadoop-0002作为master)

http://hadoop-0002:8080 —— ALIVE


http://hadoop-0001:8080 —— STANDBY

执行spark-shell

spark-shell --master spark://hadoop-0002:7077


  

WordCount

为了运行WordCount,需要上传一个文件到HDFS

hadoop fs -put wordcount.txt /spark/wordcount


 

切回spark-shell,执行如下

val rdd = sc.textFile("hdfs://hadoop-0000:9000/spark/wordcount/wordcount.txt")


 

接着执行

rdd.flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).collect

这条语句等价于

rdd.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b).collect


 

猜你喜欢

转载自agilestyle.iteye.com/blog/2294233