在IPython Notebook运行Python Spark

安装Anaconda

在continuum上下载Anaconda2-2.5.0 for Linux

终端命令行:

wget https://repo.continuum.io/archive/Anaconda2-2.5.0-Linux-x86_64.sh
bash Anaconda2-2.5.0-Linux-x86_64.sh -b

-b 指batch,批次安装,自动省略阅读License条款,自动安装至/home/用户名/anaconda2路径

编辑 ~/.bashrc加入模块路径

sudo gedit ~/.bashrc

将Anaconda加入路径,在bashrc中键入:

export PATH=/home/本机用户名/anaconda2/bin:$PATH
export ANACONDA_PATH=/home/本机用户名/anaconda2

设置pyspark

export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python

保存,关闭。使~/.bashrc修改生效

source ~/.bashrc

查看python版本

python --version

在slaves中(hadoop1,hadoop2,hadoop3)按照以上步骤安装Anaconda


在IPython Notebook 使用Spark

创建ipython NoteBook工作目录

mkdir -p ~/pythonwork/ipynotebook

切换目录

cd ~/pythonwork/ipynotebook

在Ipython NoteBook界面中运行pyspark

在终端输入命令,再在Ipython Notebook界面运行pyspark(默认本机运行pyspark)

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark

关闭Ipython NoteBook,终端ctrl+c组合键关闭


使用IPython Notebook 在Hadoop YARN-client 模式运行

输入命令:

start-all.sh
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pyspark


使用IPython Notebook 在Spark Stand Alone 模式运行

启动Spark Stand Alone cluster

/usr/local/spark/sbin/start-all.sh

切换ipynotebook工作目录,运行pyspark

cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 3 --executor-memory 512m

猜你喜欢

转载自blog.csdn.net/weixin_40170902/article/details/82503530