A compilation
For example to spark2.4 hadoop2.8.4
1, spark project root pom file modification
New pom file
<profile>
<id>hadoop-2.8</id>
<properties>
<hadoop.version>2.8.4</hadoop.version>
</properties>
</profile>
2, performed in the spark home directory
mvn -T 4 -Pyarn -Phadoop-2.8 -Dhadoop.version=2.8.4 -DskipTests clean package
In order to accelerate the implementation of
Compile dev directory
we make- distribution.sh modify #VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null\ # | grep -v "INFO"\ # | grep -v "WARNING"\ # | tail -n 1) #SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\ # | grep -v "INFO"\ # | grep -v "WARNING"\ # | tail -n 1) #SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\ # | grep -v "INFO"\ # | grep -v "WARNING"\ # | tail -n 1) SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\ | grep -v "INFO"\ | grep -v "WARNING"\ | fgrep --count "<id>hive</id>";\ # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ # because we use "set -o pipefail" echo -n) VERSION=2.4.0 SCALA_VERSION=2.11.8 SPARK_HADOOP_VERSION=2.8.4
3, complete maven compiler to package
Performed in the spark root directory
./dev/make-distribution.sh --name hadoop2.8 --tgz -PR -Phadoop-2.8.4 -Phive -Phive-thriftserver -Pyarn
I asked to speed up the compilation modify dev directory
I.e., to form the corresponding finished version of the root directory of a jar spark_home
Two Remote Debug
1. Compile the files in the remote spark project
spark-2.4.0-bin-hadoop2.8/conf/spark-defaults.conf
This increase follows spark driver side code to debug
spark.driver.extraJavaOptions -agentlib: jdwp = transport = dt_socket, ip server = n, address = your present machine: 5007, suspend = y
The same debugging excutor can be so only need to add content to the spark.executor.extraJavaOptions
2 We will spark the idea to import source
Configuring remote debug
So listen mode using here because a local network with remote barrier
Local idea spark to start the project first and then start the remote debug task of spark
Figure
Here is injoy yourself