Linux虚拟机下搭建Spark集群详细步骤

一.完全分布式多节点SSH免密登录

1、首先在三台机器上配置对本机的ssh免密码登录

生成本机的公钥,过程中不断敲回车即可,ssh-keygen命令默认会将公钥放在/root/.ssh目录下

# cd /home/root/.ssh

# sudo rm -rf ./*

# ssh-keygen -t rsa

# ls

id_rsa  id_rsa.pub

将公钥复制为authorized_keys文件,此时使用ssh连接本机就不需要输入密码了

# cd /root/.ssh

# cp id_rsa.pub authorized_keys

2、接着配置三台机器互相之间的ssh免密码登录

使用ssh-copy-id -i spark命令将本机的公钥拷贝到指定机器的authorized_keys文件中(方便好用)

# ssh-copy-id -i spark1

# ssh-copy-id -i spark2

# ssh-copy-id -i spark3

在每个节点上重复上述步骤

 

3、互相登录验证一下,在spark1上:

# ssh spark2

Last login: Fri Mar 16 16:55:26 2018 from 192.168.202.1

# exit        退出登录


二.Scala安装配置

1.解压

# tar -zxvf scala-2.11.4.tgz -C /opt/modules

2.配置环境变量

# vi /etc/profile

# SCALA_HOME

export SCALA_HOME=/opt/modules/scala-2.11.4

export PATH=$PATH:$SCALA_HOME/bin

# source /etc/profile

3.验证安装

# scala -version

scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL

4.在每台机器上都要配置scala环境,可以直接用scp分发节点

# scp -r /opt/modules/scala-2.11.4/ 192.168.202.152:/opt/modules


三.Spark配置

安装spark

# tar -zxvf  spark-1.5.1-bin-hadoop2.4.tgz -C /opt/modules/

# mv spark-1.5.1-bin-hadoop2.4 spark

 

配置环境变量

# vi /etc/profile

# SPARK_HOME

export SPARK_HOME=/opt/modules/spark

export PATH=$PATH:$SPARK_HOME/bin

export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

# source /etc/profile

 

修改spark-env.sh文件

# cd /opt/modules/spark/conf

# cp spark-env.sh.template spark-env.sh

# vi /opt/modules/spark-1.5.1-bin-hadoop2.4/conf/spark-env.sh

export JAVA_HOME=/opt/modules/jdk1.8.0_151

export SCALA_HOME=/opt/modules/scala

export SPARK_MASTER_IP=192.168.202.151

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=/opt/modules/hadoop-2.5.0/etc/hadoop

 

 修改slaves文件

# mv slaves.template slaves

# vi slaves

192.168.202.151

192.168.202.152

192.168.202.153

 

分发节点

# scp -r /opt/modules/spark/ 192.168.202.152:/opt/modules/

 

启动spark集群

# cd /opt/modules/spark-1.5.1-bin-hadoop2.4/sbin/

# ./start-all.sh

starting org.apache.spark.deploy.master.Master, logging to /opt/modules/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-spark1.out

192.168.202.153: starting org.apache.spark.deploy.worker.Worker, logging to /opt/modules/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark3.out

192.168.202.152: starting org.apache.spark.deploy.worker.Worker, logging to /opt/modules/spark-1.5.1-bin-hadoop2.4/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-spark2.out

[root@spark1 sbin]# jps

6048 Jps

2592 NodeManager

5921 Master

2482 NameNode

2677 JobHistoryServer

2538 DataNode

2735 QuorumPeerMain

[root@spark2 conf]# jps

4225 Jps

2629 QuorumPeerMain

4153 Worker

2474 NodeManager

2427 DataNode

2558 ResourceManager

 

使用jsp和8080端口可以检查集群是否启动成功

在浏览器输入:spark1:8080

进入spark-shell查看是否正常

# spark-shell


猜你喜欢

转载自blog.csdn.net/MusicEnchanter/article/details/80595740