After the submission method using the specified python environment pyspark

First, literacy

This article helps explain the following issues:
  1. What spark on Yarn that?
  2. What is the operating principle pyspark application?
  3. How pyspark application running Yarn?

Second, the actual

  • The python environment package
cd path_to_python
* Note that to enter the next package python directory, otherwise the whole path (path python is located) will be packed, unpacked later time, there will be long python in front of the path, can not properly parse error
zip -r path_to_pythonzip/python_user.zip ./*
  • The python environment uploaded to hdfs
hadoop fs -put python_user.zip 
  • Modify spark Profile
Spark-defualts.config configuration file, uploads the python such that, during spark-submit, automatically python bundle to each working node environment.
cp spark-defaults.conf  spark-user.conf

# 修改相关配置
spark.yarn.dist.archives path_to_hdfs/python_user.zip#python

* Note that the last #python surface can not be deleted, he would probably find python path in the path after the zip decompression, then named python. This involves the configuration file pyspark the python can find the right

 

  • Modify the script submitted
#!bin/bash
spark-submit --master yarn \
--driver-memory 4G --executor-memory 12G \
--properties-file conf/spark-user.conf \
--py-files other_dependence.py main.py

Third, the operating results

Simply run it, gensim version output python environment

Published 120 original articles · won praise 35 · views 170 000 +

Guess you like

Origin blog.csdn.net/u012328476/article/details/78894669