1. compressed project file
sudo zip -r project .zip.gz ./*
2. Configure PYTHONPATH, point to that directory
Create a profile conf.py file 3. Project
AI_PLATFORM_SOURCE = r '/ usr / project .zip'
2. Code References external module
# referenced module path from the conf
from the conf Import item path
sys.path.append (item path)
from path item Settings Import
Reference compressed class
import_module = "engineering class path. {0}." The format (class_name)
Module1 = importlib.import_module (import_module, base_dir)
HandlerClass = getattr (Module1, class_name)
# = HandlerClass Handler (json.dumps (the params ))
filename = data_dir + 'feature_filter /' + 'feature_filter.json'
Handler HandlerClass = (filename)
res = handler.execute(gai_ss.ss.sparkContext,gai_ss.ss )
4. Execute the program
using zip -r gai_platform.zip packaged in a project subdirectory *
Submit a cluster running
bin / spark-submit --py-files project .zip project path --master the Yarn the --deploy-the MODE /demo.py Cluster
bin / spark-submit --py-files project .zip project path / demo. py --master yarn --deploy-mode client
spark-submit --py-files hdfs: // localhost: 8020 / user / dp / data / program path /demo.py --master local project .zip
bin/spark-submit \
main.py
--py-files main.py module packages needed, py files are packaged together (labeled [* .zip * .egg (a third module (numpy, pandas))] local file)
to execute the script places where df
Reprinted from: https://blog.csdn.net/dymkkj/article/details/86006088