Spark 02 安装配置（环境搭建）、编译

1 环境搭建

1）下载解压软件包

第一种方式：下载可执行tar包，直接解压
第二种方式：下载源码包，编译后解压

2）配置环境变量

1.1 local模式

1）启动spark-shell

spark-shell --master local[2]

1.2 standalone模式

Spark Standalone模式的架构和Hadoop HDFS/YARN很类似：1 master + n worker

1）修改spark-2.1.0-bin-2.6.0-cdh5.7.0/conf/spark-env.sh

# 在最后进行配置
SPARK_MASTER_HOST=hadoop000
SPARK_WORKER_CORES=2
SPARK_WORKER_MEMORY=2g
SPARK_WORKER_INSTANCES=1

# 配置解释
SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
SPARK_WORKER_CORES, to set the number of cores to use on this machine
SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
SPARK_WORKER_INSTANCES, to set the number of worker processes per node

2）启动

spark-2.1.0-bin-2.6.0-cdh5.7.0/sbin/start-all.sh
spark-shell --master spark://hadoop000:7077

2 编译

1）前提条件

软件条件：

The Maven-based build is the build of reference for Apache Spark. Building Spark using Maven requires Maven 3.3.9 or newer and Java 8+. Note that support for Java 7 was removed as of Spark 2.2.0.

设置：

You’ll need to configure Maven to use more memory than usual by setting MAVEN_OPTS:
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

需要安装软件，并配置好环境变量：

jdk 8+
maven 3.3.9+
hadoop-2.6.0-cdh5.7.0.tar.gz
Scala-2.11.8

2）编译

修改文件 spark-2.2.0/dev/make-distribution.sh，将原文中的版本信息注释掉。然后在下方重写自己的软件版本。

# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | fgrep --count "<id>hive</id>";\
#     # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#     # because we use "set -o pipefail"
#     echo -n)

VERSION=2.2.0
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1

修改文件 spark-2.2.0/pom.xml

# 添加如下
<repository>
  <id>cloudera</id>
  <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>

<repository>  
    <id>alimaven</id>  
    <name>aliyun maven</name>  
    <url>http://maven.aliyun.com/nexus/content/groups/public/</url>  
</repository>

编译命令

./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -Phive -Phive-thriftserver -Pyarn

等待编译结束

Spark 02 安装配置（环境搭建）、编译

1 环境搭建

1.1 local模式

1.2 standalone模式

2 编译

猜你喜欢