The use of pysaprk

First of all you have to make sure that you have successfully installed python and spark

Attached connection:

             Installation and configuration of pyspark

             hadoop installation

To use pyspark, you need to open hadoop first:

start-dfs.sh

Then enter in the command box:

jupyter-notebook  -- ip 192.168.50.88

 Jupyter Notebook (previously known as IPython notebook) is an interactive notebook that supports running over 40 programming languages. Data analysis can be done more easily.


Copy this URL that appears and open it on your browser:


First create a file in python language and enter in the box:

import them
import sys
spark_home = os.environ.get('SPARK_HOME',None)
if not spark_home:
    raise ValueError('SPARK environment configuration error')
sys.path.insert(0,os.path.join(spark_home,'python'))
sys.path.insert(0,os.path.join(spark_home,'python/lib/py4j-0.10.4-src.zip')) # py4j is stored in spark/python/lib, so write your spark here py4j in
exec(open(os.path.join(spark_home,'python/pyspark/shell.py')).read())

The following code appears in the running program, which means that our pyspark can be used







Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324658119&siteId=291194637
use
use