Spark 源码编译遇到的那些事儿

版权声明:本文为博主原创文章,转载请注明出处 https://blog.csdn.net/u010647035/article/details/87865756

1、编译环境

Windows 7
JDK 1.8
Scala-2.12.4
Maven-3.6.0
Spark-2.3.0

spark源码下载地址:https://github.com/apache/spark

2、遇到的问题

2.1、直接在源码根目录编译报以下错误

mvn -DskipTests clean package
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.8:run (default) on project spark-core_2.11: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "bash" (in directory "D:\workspace\IDEA2017\gitl
ab\spark-2.3.0\core"): CreateProcess error=2, 系统找不到指定的文件。
[ERROR] around Ant part ...<exec executable="bash">... @ 4:27 in D:\workspace\IDEA2017\gitlab\spark-2.3.0\core\target\antrun\build-main.xml
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :spark-core_2.11

解决办法

借助 git bash 进行编译,打开 git bash 进入spark 源码目录

root@Elag MINGW64 /d/workspace/IDEA2017/gitlab/spark-2.3.0
$ cd d:

root@Elag MINGW64 /d
$ cd workspace/IDEA2017/gitlab/spark-2.3.0/

root@Elag MINGW64 /d/workspace/IDEA2017/gitlab/spark-2.3.0

2.2、Maven 找不到

./build/mvn: line 143: /d/workspace/IDEA2017/gitlab/spark-2.3.0/build/zinc-0.3.15/bin/zinc: No such file or directory
./build/mvn: line 145: /d/workspace/IDEA2017/gitlab/spark-2.3.0/build/zinc-0.3.15/bin/zinc: No such file or directory
Using `mvn` from path: /d/Program Files/Java/apache-maven-3.6.0/bin/mvn
./build/mvn: line 157: /d/Program: No such file or directory

解决办法

由错误提示可知,maven 路径中有更换maven安装路径,并修改环境变量

2.3、Maven 工作内存不足

[ERROR] Java heap space -> [Help 1]

解决办法

设置maven工作内存

export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

3、编译方式

3.1、使用 build 下的 mvn

./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests clean package 

在这里插入图片描述

3.2、使用 dev 下的 make-distribution.sh

编译成可部署在分布式系统上的可执行版本,最后的包名为: spark-2.3.0-bin-my_spark_2.3.0.tgz

./dev/make-distribution.sh --name my_spark_2.3.0 --tgz  -Pyarn -Phive -Phive-thriftserver -Phadoop-2.7 -Dhadoop.version=2.7.0 -DskipTests  -X

–name:自定义名称

-Pyarn:启用yarn支持

-Phive -Phive-thriftserver:启用支持hive和jdbc

-Phadoop-2.7 -Dhadoop.version=2.7.0:支持2.7.0版本的Hadoop

3.3、单独构建子模块

可以使用 mvn -pl 选项构建Spark子模块。

例如,您可以使用以下命令构建Spark Streaming模块:

./build/mvn -pl :spark-streaming_2.11 clean install

注意事项:

使用 git bash 对spark 源码进行编译或打包

本机maven的环境变量路径中不要出现空格

参考资料:
http://spark.apache.org/docs/latest/building-spark.html#change-scala-version

猜你喜欢

转载自blog.csdn.net/u010647035/article/details/87865756