spark-submit submit python script recording process

        Recently learned spark, submit a python script with spark-submit the command, beginning the old error, it intends to take to sort out the submission process python script with spark-submit the command. Look at the optional parameter spark-submit the

1.spark-submit parameters

--master MASTER_URL: setting up a cluster of primary URL, used to determine where to perform the job submission. Common options are

             local: submit to the local server to perform, and assigns a single thread

             local [k]: submitted to the local server to perform, and assign k threads

             spark: // HOST: PORT: submit to spark cluster deployed in standalone mode, and specify the primary node IP and port

             mesos: // HOST: PORT: submit to mesos cluster mode deployment, and specify the primary node IP and port

             yarn: yarn submitted to the cluster mode deployment

--deploy-mode DEPLOY_MODE: Set driver start is not known, the following options, the default is client

             client: Start driver on the client, so that a logical operation performed on the Client, the task execution on cluster

             cluster: logical operations and execute tasks on both cluster, cluster mode is not supported in Mesos cluster or Python applications

--class CLASS_NAME: class entry specified application, i.e. the main class, only for java, scala program, the program does not act on python

--name NAME: name of the application

--jars JARS: comma-separated list of local driver jar package executor and the class path, the program codes and packaged into a Resource jar package

--packages: maven classpath driver and coordinates included in the executor in the jar

To avoid conflicts library jars, --package specified parameter is not included: --exclude-packages

--repository: additional remote repository (include jars package), etc., you can search by coordinates maven

--py-files PY_FILES: comma-separated the .zip, .egg, .py files, those files will be placed in the PYTHONPATH, the only parameter for python applications

--files FILES: comma-separated list of files, those files will be stored at each node processes a work directory

--conf PROP = VALUE: the value of the specified attribute configuration spark format PROP = VALUE, e.g. -conf spark.executor.extraJavaOptions = "- XX: MaxPermSize = 256m"

--properties-file FILE: Specifies the additional load profiles, separated by commas, if not specified, the default is conf / spark-defaults.conf

--driver-memory MEM: memory configuration driver, the default is 1G

--driver-java-options: Extra options passed to the driver's

--driver-library-path: the path passed to the driver of additional libraries

--driver-class-path: additional class path passed to the driver, with the added --jars jar package automatically included in the classpath

--executor-memory MEM: memory for each executor, the default is 1G

When '--master' parameter to Standalone, '- deploy-mode' parameter is set to cluster, you can be provided the following options:

  --driver-cores NUM: driver using the number of cores, the default is 1

When '--master' parameter to Standalone or Mesos, - when the 'deploy-mode' parameter is set to cluster, can be provided the following options:

  --supervise: If you set this parameter, driver failure will restart

  --kill SUBMISSION_ID: If you set this parameter, it will kill the designated driver process SUBMISSION_ID

  --status SUBMISSION_ID: If this parameter is set, the driver returns the status of the specified request SUBMISSION_ID

When '--master' parameter is set or Standalone Mesos, options can be set as follows:

   --total-executor-cores NUM: set the number of cores on all cluster nodes executor work used

When '--master' parameter is set or Standalone YARN, options can be set as follows:

  --executor-cores NUM: number of cores used for each executor

When '--master' parameter is set to YARN, options can be set as follows:

   --driver-cores NUM: --deploy-mode is when Cluster, the number of cores used in the driver, the default is 1

   --queue QUEUE_NAME: YARN which will be submitted to the task queue, the default is the default queue YARN

  --num-executors NUM: number executor set to start, the default is 2

  --archives ARCHIVES: archive is extracted each executor to the working directory list, separated by commas

2. Submit a python script

When submitting firstApp.py script, use the following command

$ spark-submit \
--master local[2] \
--num-executors 2 \
--executor-memory 1G \
--py-files /home/hadoop/Download/test/firstApp.py

The results reported the following error "Error: Can not load main class from JAR file: /home/hadoop/Download/spark-2.1.1-bin-hadoop2.7/bin/master", which: / home / hadoop / Download / spark- 2.1.1-bin-hadoop2.7 / bin / master associated with hadoop installation path, although there is no main class python script that said, it can be guessed that the error is due to function entry can not find cause, I'm here looking for a fight some answers, - py-fiels parameter is used to add application depends on the python file, we have to submit an application can be submitted directly to the following way, so that no incorrect report

$ spark-submit \
--master local[2] \
--num-executors 2 \
--executor-memory 1G \
 /home/hadoop/Download/test/firstApp.py

 

Guess you like

Origin www.cnblogs.com/hgz-dm/p/11356392.html