Spark2.1 source code compilation

This article introduces the source code compilation of spark2.1.0

1. Compilation environment:

Jdk1.8 or above

Hadoop2.7.3

Scala2.10.4

Requirements:

Maven 3.3.9 or above (important)

Click here to download

http://mirror.bit.edu.cn/apache/maven/maven-3/3.5.2/binaries/apache-maven-3.5.2-bin.tar.gz

 

Modify /conf/setting.xml

<mirror>

        <id>alimaven</id>

        <name>aliyun maven</name>

        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>

        <mirrorOf>central</mirrorOf>

</mirror>



2.  Download http://spark.apache.org

2.1Download

 

2.2.  Unzip

tar -zxvf spark-2.1.0.tgz

3.  Enter the main directory, modify the compiled file, and compile

Modify make-distribution.sh in the spark-2.1.0/dev directory and comment out the original specified version, which can save time

vi make-distribution.sh


Tips:

As shown in the figure in this file, there is less " - " before czf , you need to modify it yourself


note:

If the hadoop version you use is cdh, you need to modify the pom.xml file in the spark root directory and add the dependency of cdh

<repository>
        <id>cloudera</id>
        <name>cloudera Repository</name>
        <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
添加在<repositorys></repositorys>里


3.1 Set up memory

export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"

3.2 Compile

./dev/make-distribution.sh \

--name 2.7.3 \

--tgz   \

-Pyarn \

-Phadoop-2.7 \  -Dhadoop.version=2.7.3 \

-Phive -Phive-thriftserver \

-DskipTests clean package



Then just wait quietly, the first compilation time may be very long, several hours or ten hours, depending on the network speed, because there are many packages to download

Command explanation:

--name 2.7.3 *** Specify the compiled spark name, name=

--tgz *** compress to tgz format

-Pyarn \ *** support yarn platform

-Phadoop-2.7 \ -Dhadoop.version=2.7.3 \ *** Specify hadoop version 2.7.3

-Phive -Phive-thriftserver \ ***支持hive

-DskipTests clean package *** Skip test package






Well, the compilation of spark is over here


Let me share some of the problems encountered in compilation

Error 1 :

Failed to execute goal on project spark-launcher_2.11:

Could not resolve dependencies for project org.apache.spark:spark-launcher_2.11:jar:2.1.0:

Failure to find org.apache.hadoop:hadoop-client:jar:hadoop2.7.3 in https://repo1.maven.org/maven2 was cached in the local repository,

resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]

解决:遇该错误,原因可能是编译命令中有参数写错。。。。(希望你没遇到大笑

错误2

+ tar czf 'spark-[info] Compile success at Nov 28, 2017 11:27:10 AM [20.248s]-bin-2.7.3.tgz' -C /zhenglh/new-spark-build/spark-2.1.0 'spark-[info] Compile success at Nov 28, 2017 11:27:10 AM [20.248s]-bin-2.7.3'

tar (child): Cannot connect to spark-[info] Compile success at Nov 28, 2017 11: resolve failed

编译的结果没打包:

spark-[info] Compile success at Nov 28, 2017 11:27:10 AM [20.248s]-bin-2.7.3

这个错误可能第一次编译的人都会遇到

解决:见温馨提示吐舌头






Guess you like

Origin blog.csdn.net/babyhuang/article/details/78656093