First of all you have to make sure that you have successfully installed python and spark
Attached connection:
Installation and configuration of pyspark
To use pyspark, you need to open hadoop first:
start-dfs.sh
Then enter in the command box:
jupyter-notebook -- ip 192.168.50.88
Jupyter Notebook (previously known as IPython notebook) is an interactive notebook that supports running over 40 programming languages. Data analysis can be done more easily.
Copy this URL that appears and open it on your browser:
First create a file in python language and enter in the box:
import them import sys spark_home = os.environ.get('SPARK_HOME',None) if not spark_home: raise ValueError('SPARK environment configuration error') sys.path.insert(0,os.path.join(spark_home,'python')) sys.path.insert(0,os.path.join(spark_home,'python/lib/py4j-0.10.4-src.zip')) # py4j is stored in spark/python/lib, so write your spark here py4j in exec(open(os.path.join(spark_home,'python/pyspark/shell.py')).read())
The following code appears in the running program, which means that our pyspark can be used