Spark task submission jar package dependency solution

http://blog.csdn.net/wzq294328238/article/details/48054525
 

Usually, we package the Spark task into a jar package and submit it using spark-submit. Because spark is a distributed task, if there is no corresponding dependent jar file on the running machine, a ClassNotFound error will be reported. 
There are three workarounds below:

Method 1: spark-submit –jars

According to the spark official website, specify --jars when submitting tasks, separated by commas. The disadvantage of this is that the jar package must be specified every time. If there are few jar packages, this can be done, but if there are too many jar packages, it will be very troublesome.

spark-submit --master yarn-client --jars ***.jar,***.jar(你的jar包,用逗号分隔) mysparksubmit.jar
  • 1
  • 1

If you use sbt, and have configured dependencies in build.sbt and downloaded them, then you can directly go to .ivy/cache/ in the user's home directory to copy the jar package required by your jar

Method 2: extraClassPath

When submitting, set parameters in spark-default, put all the required jar packages into a file, and then specify the directory in the parameters, which is much more convenient than the previous one:

spark.executor.extraClassPath=/home/hadoop/wzq_workspace/lib/*
spark.driver.extraClassPath=/home/hadoop/wzq_workspace/lib/*
  • 1
  • 2
  • 1
  • 2

It should be noted that you need to ensure that this directory exists on all machines that may run spark tasks, and test the jar package to all machines. The advantage of this is that you don't need to write a long list of jars when submitting code, but the disadvantage is that you have to copy all the jar packages.

Method 3: sbt-assembly

If you still feel the second trouble, this method is to package all dependent jar packages including the code you wrote together (fat-jar). Enter sbt in the project root directory, type plugins, and find that assembly is not installed by default, so we need to install the sbt-assembly plugin for sbt. 
Add to project/plugins.sbt in your project directory

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

resolvers += Resolver.url("bintray-sbt-plugins", url("http://dl.bintray.com/sbt/sbt-plugin-releases"))(Resolver.ivyStylePatterns)
  • 1
  • 2
  • 3
  • 1
  • 2
  • 3

Then we type sbt in the root directory, and then use plugins to view plugins. If you see sbtassembly.AssemblePlugin, it means that your plugin is installed successfully: you 
write picture description here 
also need to set up conflict resolution, and then use assembly on the sbt interactive command line. . This method will make the jar package very large after packaging.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326955089&siteId=291194637