版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/lihaogn/article/details/82110344
1 环境搭建
1)下载解压软件包
- 第一种方式:下载可执行tar包,直接解压
- 第二种方式:下载源码包,编译后解压
2)配置环境变量
1.1 local模式
1)启动spark-shell
spark-shell --master local[2]
1.2 standalone模式
Spark Standalone模式的架构和Hadoop HDFS/YARN很类似:1 master + n worker
1)修改spark-2.1.0-bin-2.6.0-cdh5.7.0/conf/spark-env.sh
# 在最后进行配置
SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1
# 配置解释
SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node
2)启动
spark-2.1.0-bin-2.6.0-cdh5.7.0/sbin/start-all.sh
spark-shell --master spark://hadoop000:7077
2 编译
1)前提条件
软件条件:
The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. Note that support for Java 7 was removed as of Spark 2.2.0.
设置:
You’ll need to configure Maven to use more memory than usual by setting MAVEN_OPTS:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
需要安装软件,并配置好环境变量:
- jdk 8+
- maven 3.3.9+
- hadoop-2.6.0-cdh5.7.0.tar.gz
- Scala-2.11.8
2)编译
修改文件 spark-2.2.0/dev/make-distribution.sh,将原文中的版本信息注释掉。然后在下方重写自己的软件版本。
# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
# | grep -v "INFO"\
# | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
# | grep -v "INFO"\
# | fgrep --count "<id>hive</id>";\
# # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
# # because we use "set -o pipefail"
# echo -n)
VERSION=2.2.0
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1
修改文件 spark-2.2.0/pom.xml
# 添加如下
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
<repository>
<id>alimaven</id>
<name>aliyun maven</name>
<url>http://maven.aliyun.com/nexus/content/groups/public/</url>
</repository>
编译命令
./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn
等待编译结束