Spark-环境搭建

 

 

搭建hadoop集群

hadoop2.7.3 + spark1.6.1 + scala2.11.8 + jdk1.8.0_101

下载hadoop2.7,修改$HADOOP_HOME/etc/hadoop下的hadoop-env.sh文件

 

export JAVA_HOME=/soft/jdk1.8.0_101
 

 

修改core-site.xml文件(这里讲将数据目录data就放在$HADOOP_HOME下了

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.186.128:9000</value>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/root/spark/hadoop-2.7.3/data</value>
    </property>
</configuration>
 

 

修改hdfs-site.xml文件

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>3</value>
    </property>
</configuration>

 

 

先格式化

$HADOOP_HOME/bin/hdfs namenode -format
 

 

启动namenode和datanode

./hadoop-daemon.sh start namenode
./hadoop-daemon.sh start datanode 
 

 

关闭iptables

service iptables stop 
chkconfig --level 35 iptables off

 

修改hostname

#几个修改方式
hostname 【主机名】
vim /etc/sysconfig/network
sysctl kernel.hostname
vim /etc/hosts

 

 

 

 

 

搭建hadoop yarn

修改yarn-en.sh

JAVA=/soft/jdk1.8.0_101/bin/java
  

 

修改yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname</name>
         <value>vm128</value>
    </property>
</configuration>
 

 

启动节点

./yarn-daemon.sh start resourcemanager
./yarn-daemon.sh start nodemanager
 

 

 

 

 

 

搭建spark

下载scala,安装最新版本即可,然后配置scala home

JAVA_HOME=/soft/jdk1.8.0_101
SCALA_HOME=/root/spark/scala-2.11.8
PATH=$PATH:$JAVA_HOME/bin:$SCALA_HOME/bin:

export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL JAVA_HOME SCALA_HOME
  

 

修改$SPARK_HOME/conf下的 spark-env.sh

export SCALA_HOME=/root/spark/scala-2.11.8
export JAVA_HOME=/soft/jdk1.8.0_101
export SPARK_MASTER_IP=192.168.186.128
export SPARK_WORKER_MEMORY=512M
export HADOOP_CONF_DIR=/root/spark/hadoop-2.7.3/etc/hadoop
 

 

启动节点

$SPARK_HOME/bin/start-master.sh
./start-slave.sh spark://192.168.186.128:7077
 

 

jps结果

NameNode和DataNode是hdfs进程

ResourceManager和NodeManager是YARN进程

Master和Worker是spark进程

6368 Master
7666 Jps
6756 Worker
4343 DataNode
5052 NodeManager
4446 NameNode
4798 ResourceManager
 

 

 

 

 

 

运行简单例子

$SPARK_HOME/bin/spark-shell

 先上传一个文件到hdfs中

$HADOOP_HOME/bin/./hdfs dfs -mkdir /test
./hdfs dfs -put /root/spark/spark-2.0.0-bin-hadoop2.7/conf/spark-defaults.conf.template /test/xx

var textFile = sc.textFile("hdfs://192.168.186.128:9000/test/xx") 
var line = textFile.filter(line=>line.contains("spark"))

#执行count后就可以计算了
line.count()

#map,filter,collect函数
sc.parallelize(1 to 100).map(_*2).filter(_>50).filter(_<180).collect

 

web UI端口

#hadoop界面
http://192.168.186.128:50070/dfshealth.html#tab-datanode

#yarn界面
http://192.168.186.128:8088/cluster/apps/RUNNING

#spark界面
http://192.168.186.128:8080/

#spark-shell启动后的任务监控界面
http://192.168.186.134:4040/

 

 

 

 

 

参考

Hadoop 2.6.4分布式集群环境搭建

Spark 1.6.1分布式集群环境搭建

Spark 简单实例(基本操作)

Spark 入门实战之最好的实例

 

 

 

猜你喜欢

转载自xxniao.iteye.com/blog/2323414
今日推荐