版权声明:个人 https://blog.csdn.net/csdnmrliu/article/details/82464999
1. spark应用打成Jar包提交到spark on yarn执行时依赖冲突
解决:使用maven项目开发时,可以把spark、scala、hadoop相关依赖添加以下标签
<scope>provided</scope>
例如:
<dependencies>
<!-- scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>provided</scope>
</dependency>
<!-- spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<!-- spark-streaming -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<!-- hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/redis.clients/jedis -->
<dependency>
<groupId>redis.clients</groupId>
<artifactId>jedis</artifactId>
<version>2.9.0</version>
</dependency>
</dependencies>
2. 运行Spark程序时可能出现Caused by: java.lang.ClassNotFoundException: jxl.read.biff.BiffException
解决:添加jxl的依赖
<!-- https://mvnrepository.com/artifact/jexcelapi/jxl -->
<dependency>
<groupId>jexcelapi</groupId>
<artifactId>jxl</artifactId>
<version>2.4.2</version>
</dependency>
3.spark+Maven项目其他依赖没有打到Jar包
解决:添加 maven-assembly 插件
<build>
<finalName>${project.artifactId}</finalName>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<archive>
<manifest>
<mainClass>主类完整类名</mainClass>
</manifest>
</archive>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>assembly</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
</plugins>
</build>
4. Eclipse 本地调试Spark执行时出现java.lang.OutOfMemoryError: GC overhead limit exceeded和java.lang.OutOfMemoryError: java heap space
解决:设置 SparkConf 相关参数spark.executor.memoryOverhead 及 JVM参数 -Xmx2048m
SparkConf sparkConf = new SparkConf()
.setAppName("jobName")//Job名称
.setMaster("local[1]")
.set("spark.executor.memoryOverhead","2048");