环境:
hadoop2.6.4
jdk1.8
centos4.8
步骤如下:
1.在Apache官网下载Spark2.3.1的安装包
2.上传到Linux并解压
tar -zxvf spark-2.3.1-bin-hadoop2.6.tgz -C /usr/local/
3.进入spark-2.3.1-bin-hadoop2.6/conf
cd /usr/local/spark-2.3.1-bin-hadoop2.6/conf
3.1 配置文件spark-env.sh(从spark-env.sh.template拷贝即可)
cp spark-env.sh.template spark-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_151
export HADOOP_CONF_DIR=/usr/local/hadoop-2.6.4/etc/hadoop/
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_MEMORY=512m
export SPARK_WORKER_CORES=1
export SPARK_EXECUTOR_MEMORY=512m
export SPARK_EXECUTOR_CORES=1
export SPARK_WORKER_INSTANCES=1
3.2 配置文件 slaves(从slaves.template拷贝)
删除内容,并添加
slave1
slave2
slave3
3.3 配置文件spark-defaults.conf (从spark-defaults.conf.template拷贝)
spark.master spark://master:7077
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:8020/spark-logs
spark.history.fs.logDirectory hdfs://master:8020/spark-logs
4.在HDFS中新建目录:
hdfs dfs -mkdir /spark-logs
5.执行
scp -r /usr/local/spark-2.3.1-bin-hadoop2.6/ slave1:/usr/local
scp -r /usr/local/spark-2.3.1-bin-hadoop2.6/ slave2:/usr/local
scp -r /usr/local/spark-2.3.1-bin-hadoop2.6/ slave3:/usr/local
6.启动Spark
进入目录:
cd /usr/local/spark-2.3.1-bin-hadoop2.6/sbin
启动spark
./start-all.sh
./start-history-server.sh hdfs://master:8020/spark-logs
浏览器查看:
http://192.168.128.130:8080
http://192.168.128.130:18080
7.命令行执行Spark
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 --executor-memory 512m --total-executor-cores 2 ../lib/spark-examples-1.4.1-hadoop2.6.0.jar 1000