Java本地模式开发Spark程序开发遇到的问题

版权声明:个人 https://blog.csdn.net/csdnmrliu/article/details/82464999

1. spark应用打成Jar包提交到spark on yarn执行时依赖冲突

解决:使用maven项目开发时,可以把spark、scala、hadoop相关依赖添加以下标签

<scope>provided</scope>

例如:

<dependencies>
    <!-- scala-library -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>${scala.version}</version>
        <scope>provided</scope>
    </dependency>

    <!-- spark-core -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>

    <!-- spark-streaming -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>

    <!-- hadoop-client -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
        <scope>provided</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/redis.clients/jedis -->
    <dependency>
        <groupId>redis.clients</groupId>
        <artifactId>jedis</artifactId>
        <version>2.9.0</version>
    </dependency>

</dependencies>

2. 运行Spark程序时可能出现Caused by: java.lang.ClassNotFoundException: jxl.read.biff.BiffException

解决:添加jxl的依赖

<!-- https://mvnrepository.com/artifact/jexcelapi/jxl -->
<dependency>
    <groupId>jexcelapi</groupId>
    <artifactId>jxl</artifactId>
    <version>2.4.2</version>
</dependency>

3.spark+Maven项目其他依赖没有打到Jar包

解决:添加 maven-assembly 插件

<build>
    <finalName>${project.artifactId}</finalName>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <appendAssemblyId>false</appendAssemblyId>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
                <archive>
                    <manifest>
                        <mainClass>主类完整类名</mainClass>
                    </manifest>
                </archive>          
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>assembly</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
        <plugin>  
            <groupId>org.apache.maven.plugins</groupId>  
            <artifactId>maven-compiler-plugin</artifactId>  
            <configuration>  
                <source>1.7</source>  
                <target>1.7</target>
            </configuration>  
        </plugin>
    </plugins>
</build>

4. Eclipse 本地调试Spark执行时出现java.lang.OutOfMemoryError: GC overhead limit exceeded和java.lang.OutOfMemoryError: java heap space

解决:设置 SparkConf 相关参数spark.executor.memoryOverhead 及 JVM参数 -Xmx2048m

SparkConf sparkConf = new SparkConf()
        .setAppName("jobName")//Job名称
        .setMaster("local[1]")
        .set("spark.executor.memoryOverhead","2048");

设置JVM参数

5. 运行Spark程序时 Task not serializable

解决:将出现异常的类继承原类然后实现 Serializable 接口

实现 Serializable 接口

猜你喜欢

转载自blog.csdn.net/csdnmrliu/article/details/82464999