安装Anaconda
在continuum上下载Anaconda2-2.5.0 for Linux
终端命令行:
wget https://repo.continuum.io/archive/Anaconda2-2.5.0-Linux-x86_64.sh
bash Anaconda2-2.5.0-Linux-x86_64.sh -b
-b 指batch,批次安装,自动省略阅读License条款,自动安装至/home/用户名/anaconda2路径
编辑 ~/.bashrc加入模块路径
sudo gedit ~/.bashrc
将Anaconda加入路径,在bashrc中键入:
export PATH=/home/本机用户名/anaconda2/bin:$PATH
export ANACONDA_PATH=/home/本机用户名/anaconda2
设置pyspark
export PYSPARK_DRIVER_PYTHON=$ANACONDA_PATH/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_PATH/bin/python
保存,关闭。使~/.bashrc修改生效
source ~/.bashrc
查看python版本
python --version
在slaves中(hadoop1,hadoop2,hadoop3)按照以上步骤安装Anaconda
在IPython Notebook 使用Spark
创建ipython NoteBook工作目录
mkdir -p ~/pythonwork/ipynotebook
切换目录
cd ~/pythonwork/ipynotebook
在Ipython NoteBook界面中运行pyspark
在终端输入命令,再在Ipython Notebook界面运行pyspark(默认本机运行pyspark)
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
关闭Ipython NoteBook,终端ctrl+c组合键关闭
使用IPython Notebook 在Hadoop YARN-client 模式运行
输入命令:
start-all.sh
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop MASTER=yarn-client pyspark
使用IPython Notebook 在Spark Stand Alone 模式运行
启动Spark Stand Alone cluster
/usr/local/spark/sbin/start-all.sh
切换ipynotebook工作目录,运行pyspark
cd ~/pythonwork/ipynotebook
PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" MASTER=spark://master:7077 pyspark --num-executors 1 --total-executor-cores 3 --executor-memory 512m