1. Introduction of several operating modes
Spark several operating modes:
1)Local
2)Standalone
3)Yarn
4) Months
Download IDEA and install, can Baidu free documentation.
2.spark Standalone mode configuration and test
1) jdk1.8 installed
2) scala2.11.8 already installed
3) Hadoop2.5.0 already installed
4) Installation Spark Standalone
a) placement slave
we are slaves
bigdata-pro01.kfk.com
bigdata-pro02.kfk.com
bigdata-pro03.kfk.com
b) Configuration spark-env.sh
we spark-env.sh
export JAVA_HOME=/opt/modules/jdk1.8.0_11
export SCALA_HOME=/opt/modules/scala-2.11.8
SPARK_CONF_DIR=/opt/modules/spark-2.2.0-bin/conf
SPARK_MASTER_HOST = bigdata-pro02.kfk.com
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=1g
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=8081
c) The spark arranged to distribute to other nodes for each node and modify the specific configuration
scp -r spark-2.2.0-bin bigdata-pro01.kfk.com:/opt/modules/
scp -r spark-2.2.0-bin bigdata-pro03.kfk.com:/opt/modules/
d) starting spark
sbin/start-all.sh
e) Client Test
bin/spark-shell --master spark://bigdata-pro02.kfk.com:7077
d) cluster running
bin/spark-submit --master spark://bigdata-pro02.kfk.com:7077 --deploy-mode cluster /opt/jars/sparkStu.jar hdfs://bigdata-pro01.kfk.com:9000/user/data/stu.txt hdfs://bigdata-pro01.kfk.com:9000/user/data/output
3.spark on yarn mode configuration and test
1) Note hadoop configuration file jdk version is consistent with the current version of jdk
2) spark on yarn patterns submit jobs
bin/spark-submit --class com.spark.test.Test --master yarn --deploy-mode cluster /opt/jars/sparkStu.jar hdfs://bigdata