cdh5.7.0伪分布式集群之spark-2.2.0安装

基本环境及软件:

软件版本 软件包
centos-6.4  
JDK-1.8 jdk-8u191-linux-x64.tar.gz
hadoop-2.6.0 hadoop-2.6.0-cdh5.7.0.tar.gz
scala-2.11.8 scala-2.11.8.tgz
spark-2.2.0 spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz

软件安装包官网下载地址 :http://archive-primary.cloudera.com/cdh5/cdh/5/ 

安装scala

1.将scala安装包scala-2.11.8.tgz上传到虚拟机的/usr/local/app目录下

2.对scala-2.11.8.tgz进行解压缩:

# tar -zxvf scala-2.11.8.tgz

3.配置scala相关的环境变量

# vim ~/.bashrc
#set sscala environment
export SCALA_HOME=/usr/local/app/scala-2.11.8
export PATH=$PATH:$SCALA_HOME/bin
# source ~/.bashrc

4.查看scala是否安装成功:

# scala -version

出现下面的结果,表示安装成功

扫描二维码关注公众号,回复: 4301415 查看本文章

 

安装spark

注意:安装之前需要先安装Hadoop,请参考另一篇文章:https://blog.csdn.net/weixin_39689084/article/details/84548507

下载saprk有两种方式:

  • 第一种方式:下载可执行tar包,直接解压
  • 第二种方式:下载源码包,编译后解压

这儿用到的事第二种方式,具体怎么编译在这儿不做详细介绍,自行百度,下面介绍按章方法:

local模式搭建

1.将编译好的saprk安装包spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz  上传到虚拟机的/usr/local/app目录下
2、解压缩spark包:

# tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz

3、重命名spark目录:

# mv spark-2.2.0-bin-2.6.0-cdh5.7.0/ spark-2.2.0

4、修改spark环境变量

# vim ~/.bashrc
#set spark environment
export SPARK_HOME=/usr/local/app/spark-2.2.0
export PATH=$PATH:$SPARK_HOME/bin
# source ~/.bashrc

5.测试:

# spark-shell --master local[2]

 

standalone模式搭建

1、进入到/usr/local/app/spark-2.2.0/conf目录下

# cd /usr/local/app/spark-2.2.0/conf

# cp spark-env.sh.template spark-env.sh

# vi spark-env.sh

 在spark-env.sh文件最后进行配置

SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1

 配置解释

SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node

启动:

# cd /usr/local/app/spark-2.2.0/sbin

# ./start-all.sh 

 

测试:

# spark-shell --master spark://bigdata:7077

页面访问 

 

猜你喜欢

转载自blog.csdn.net/weixin_39689084/article/details/84637678