In linux , the execution pyspark code - Found Effective
1. Installation pycharm or spyder, and then write the code in which the statement is executed
2. be furnished by Job, i.e. spark-submit to submit the main stresses following this approach
First assume that he wrote the * .py file contains these packages, that is, through import import
import os from pyspark import SparkContext from pyspark.sql.session import SparkSession from pyspark.sql import HiveContext import jieba from collections import Counter from operator import itemgetter import time import ast from pyspark.sql.types import StructField, StructType, StringType from pyspark.sql import SQLContext
3. So when submitted in the spark-submit, it is necessary to contain all the required packaged in a zip file, it is noted that: the need to first packet in the same directory , then the directory a zip files together, e.g. in case there are more packets to be:
First: create a folder to hold all packages:
mkdir lib_words
Second: Copy the desired package (generally under lib python installation directory, in which third-party libraries in site-packages inside) to this folder because one option is too much trouble, so we packed together, but not copy and packaging pyspark library package
cp -r /usr/local/python3.7/lib/python3.7/* /home/lib_words
cp -r /usr/local/python3.7/lib/python3.7/site-packages/* /home/lib_words
Third: zip package
zip -r /home/lib_words.zip ./*
4. In the command line, using spark-submit to submit * .py master file, and the parameter '--py-files' import zip file, and then execute the transport
spark-submit /home/pycharm_projects/cut_words/cut_words_fre.py --py-files='/home/lib_words.zip'
Additional : First, by writing a program in which parameters directly pyFiles (SparkContext the parameter) and then submit run directly: spark-submit /home/pycharm_projects/cut_words/cut_words_fre.py, may be Found
= pyFiles [ " /home/lib_words.zip " ] # the path of the compressed packet, viable # pyFiles = [ "/ Home / test1.py", "/ Home / test2.py"] # is said to be, but because too many files, not tested sc = SparkContext ( ' local ' , ' the test ' , pyFiles = pyFiles)
Last, there is a line will appear:
19:55:06 INFO spark.SparkContext: Successfully stopped SparkContext
Note: If only pyspark the package, you may not need to add * .zip file (not tested)
reference:
https://blog.csdn.net/lmb09122508/article/details/84586947