table of Contents
3. spark distributed deployment
3.1 Information about the machine
3.2 Preparation Before Installation work
3.3.1 obtain the installation package
3.3.2 unpack, configure the environment variables
3.3.3 modify the configuration file, distributed to other machines
What 1.spark that?
First we look at the hadoop ecosystem:
hdfs + zookeeper + mapreduce/hive + hbase + storm + mahout + 其他工具
If the hive is produced in order to solve complex problems MapReduce programming, then produce Spark is to solve the problem of slow calculation of MapReduce
hdfs + zookeeper +spark + hbase + storm + mahout + 其他工具
So: spark is fast, versatile, scalable, distributed computing engine
2.spark principle
3. spark distributed deployment
Our main spark in terms of the deployment of the deployment points
3.1 Information about the machine
A list of machines | Character |
wyl01 | master、work |
wyl02 | work |
wyl03 | work |
3.2 Preparation Before Installation work
Tell me what our online request:
- jdk
- python
- scala
3.2.1 jdk installation
slightly
3.2.2 python installation
slightly
3.2.3 scala installation
scala official website , download the installation package
Unpack, configure the environment variables, enter interactive mode sacla face appears, it indicates that the installation was successful
[root@wyl01 opt]# scala
Welcome to Scala 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.
scala>
3.3 Spark installation
3.3.1 obtain the installation package
Spark official website Download
3.3.2 unpack, configure the environment variables
tar -xf spark-2.3.3-bin-hadoop2.7.tgz -C /opt
vim /etc/profile
# 添加以下内容
export SPARK_HOME=/opt/spark
export PATH=${SPARK_HOME}/bin:$PATH
3.3.3 modify the configuration file, distributed to other machines
Before modifying the configuration file, we look Spark a stand-alone version of a performance, we do follow the example of the official website
[root@wyl01 spark]# ./bin/spark-shell
2019-05-24 11:10:29 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://wyl01:4040
Spark context available as 'sc' (master = local[*], app id = local-1558667447611).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.3.3
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val textFile = spark.read.textFile("README.md") #是spark下的readme的一个文件
scala> textFile.count()
res0: Long = 103 # 看到结果是103行
Modify the configuration file, also described the official line, just modify two configuration files
Spark-env.sh modify configuration files
vim spark-env.sh
#添加以下内容
export JAVA_HOME=/opt/java
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export SCALA_HOME=/opt/scala
export SPARK_MASTER_HOST=wyl01
export SPARK_MASTER_PORT=7077
Modify slaves profile
vim slaves #添加work节点
wyl01
wyl02
wyl03
In the master node spark installation package distributed to other machines
rsync -av spark spark-2.3.3-bin-hadoop2.7 wyl02:/opt/
rsync -av spark spark-2.3.3-bin-hadoop2.7 wyl03:/opt/
3.3.4 Start Service
Start service on the master node to note here hadoop cluster also start-all.sh, if all configured environment variables, here's our best to spark sbin directories start down better
[hadoop@wyl01 sbin]# ./start-all.sh
3.3.5 verification
To see whether the process started
Check whether the normal web page access
Run a verification mission
[hadoop@wyl01 sbin]# spark-submit --class org.apache.spark.examples.SparkPi --master spark://wyl01:7077 --deploy-mode cluster --supervise --executor-memory 512m --total-executor-cores 2 /opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar 100
Running Spark using the REST application submission protocol.
2019-05-24 17:09:40 INFO RestSubmissionClient:54 - Submitting a request to launch an application in spark://wyl01:7077.
2019-05-24 17:09:51 WARN RestSubmissionClient:66 - Unable to connect to server spark://wyl01:7077.
Warning: Master endpoint spark://wyl01:7077 was not a REST server. Falling back to legacy submission gateway instead.
2019-05-24 17:09:51 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
At this point, our spark has deployed a cluster