Use spark to another jar package

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/mn_kw/article/details/89381943

The first way

Operation: The third-party jar file package to the application jar file spark eventually formed

Scenario: third-party jar files is relatively small, less local application

The second way

Action: Use spark-submit commit command parameters: --jars

Claim:

1, there is a corresponding jar file using the spark-submit command on a machine

2. As for the cluster service on other machines need the jar file, an http interfaces provided by the driver to get the jar file (for example: http: //192.168.187.146: 50206 / jars / mysql-connector-java -5.1.27-bin.jar Added By User)

 

1

2

3

## 配置参数:--jars JARS

如下示例:

$ bin/spark-shell --jars /opt/cdh-5.3.6/hive/lib/mysql-connector-java-5.1.27-bin.jar

Scenario: local requirements must be corresponding jar file

The third way

Action: Use spark-submit commit command parameters: --packages

 

## configuration parameters: - packages jar package maven address the following example: $ bin / spark-shell --packages mysql: mysql-connector-java: 5.1.27 --repositories http://maven.aliyun.com/nexus / Content / Groups / public /

## --repositories maven address is mysql-connector-java package, if not given, the default source will be used maven download the installed machine
## if dependent on a plurality of packages, is repeated jar package written above, separated by commas
## packet in the current default download .ivy / jars file in the user's root folder

 

Scenario: Local can not, when the cluster service needs the package, are from the given address maven direct download

The fourth way

Action: Change Spark configuration information: SPARK_CLASSPATH, add third-party jar files to SPARK_CLASSPATH environment variable

There must be a third party jar files are added on all machines running the application requirements Spark: Considerations

 

A. Create a save third-party jar files folder: command: $ mkdir external_jars
B. Spark modify the configuration command: $ vim conf / spark-env.sh content: SPARK_CLASSPATH = $ SPARK_CLASSPATH: /opt/cdh-5.3. 6 / the Spark / external_jars / *
C. dependent jar files will copy to the new folder command: $ cp /opt/cdh-5.3.6/hive/lib/mysql-connector-java-5.1.27-bin. jar ./external_jars/

 

Under particularly dependent jar package, write commands way more complicated, dependent packages are also many application scenarios: The scenarios

Or spark-default.conf Hereinafter, the configuration

spark.executor.extraClassPath=/data/*
spark.driver.extraClassPath=/data/*

Note :( only for spark on yarn (cluster) mode)

spark on yarn (cluster), if a third-party application dependent jar files

Final Solution: The third jar file copy to $ {HADOOP_HOME} / share / hadoop / common / lib folder (Hadoop cluster, all the machines require copy)

Guess you like

Origin blog.csdn.net/mn_kw/article/details/89381943