Can someone explain the differences between --packages
and --jars
in a spark-submit script?
nohup ./bin/spark-submit --jars ./xxx/extrajars/stanford-corenlp-3.8.0.jar,./xxx/extrajars/stanford-parser-3.8.0.jar \
--packages datastax:spark-cassandra-connector_2.11:2.0.7 \
--class xxx.mlserver.Application \
--conf spark.cassandra.connection.host=192.168.0.33 \
--conf spark.cores.max=4 \
--master spark://192.168.0.141:7077 ./xxx/xxxanalysis-mlserver-0.1.0.jar 1000 > ./logs/nohup.out &
Also, do I require the--packages
configuration if the dependency is in my applications pom.xml
? (I ask because I just blew up my applicationon by changing the version in --packages
while forgetting to change it in the pom.xml
)
I am using the --jars
currently because the jars are massive (over 100GB) and thus slow down the shaded jar compilation. I admit I am not sure why I am using --packages
other than because I am following datastax documentation
if you do spark-submit --help
it will show:
--jars JARS Comma-separated list of jars to include on the driver
and executor classpaths.
--packages Comma-separated list of maven coordinates of jars to include
on the driver and executor classpaths. Will search the local
maven repo, then maven central and any additional remote
repositories given by --repositories. The format for the
coordinates should be groupId:artifactId:version.
if it is --jars
then spark doesn't hit maven but it will search specified jar in the local file system it also supports following URL scheme hdfs/http/https/ftp.
so if it is --packages
then spark will search specific package in local maven repo then central maven repo or any repo provided by --repositories and then download it.
Now Coming back to your questions:
Also, do I require the--packages configuration if the dependency is in my applications pom.xml?
Ans: No, If you are not importing/using classes in jar directly but need to load classes by some class loader or service loader (e.g. JDBC Drivers). Yes otherwise.
BTW, If you are using specific version of specific jar in your pom.xml then why dont you make uber/fat jar of your application or provide dependency jar in --jars argument ? instead of using --packages
links to refer: