spark distributed deployment

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/wyl9527/article/details/90517478

table of Contents

 

What 1.spark that?

2.spark principle

3. spark distributed deployment

3.1 Information about the machine

3.2 Preparation Before Installation work

3.2.1 jdk installation

3.2.2 python installation

3.2.3 scala installation

3.3 Spark installation

3.3.1 obtain the installation package

3.3.2 unpack, configure the environment variables

3.3.3 modify the configuration file, distributed to other machines

3.3.4 Start Service

3.3.5 verification


What 1.spark that?

First we look at the hadoop ecosystem:

hdfs + zookeeper + mapreduce/hive + hbase + storm + mahout + 其他工具

If the hive is produced in order to solve complex problems MapReduce programming, then produce Spark is to solve the problem of slow calculation of MapReduce

hdfs + zookeeper +spark + hbase + storm + mahout + 其他工具

So: spark is fast, versatile, scalable, distributed computing engine

2.spark principle

Reference others blog

3. spark distributed deployment

Our main spark in terms of the deployment of the deployment points

3.1 Information about the machine

A list of machines Character
wyl01 master、work
wyl02 work
wyl03 work

3.2 Preparation Before Installation work

Tell me what our online request:

  • jdk
  • python
  • scala

3.2.1 jdk installation

slightly

3.2.2 python installation

slightly

3.2.3 scala installation

scala official website , download the installation package

Unpack, configure the environment variables, enter interactive mode sacla face appears, it indicates that the installation was successful

[root@wyl01 opt]# scala
Welcome to Scala 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.

scala> 

3.3 Spark installation

3.3.1 obtain the installation package

Spark official website Download

3.3.2 unpack, configure the environment variables

tar -xf spark-2.3.3-bin-hadoop2.7.tgz   -C /opt

vim /etc/profile

# 添加以下内容
export SPARK_HOME=/opt/spark
export PATH=${SPARK_HOME}/bin:$PATH

3.3.3 modify the configuration file, distributed to other machines

Before modifying the configuration file, we look Spark a stand-alone version of a performance, we do follow the example of the official website

[root@wyl01 spark]# ./bin/spark-shell
2019-05-24 11:10:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://wyl01:4040
Spark context available as 'sc' (master = local[*], app id = local-1558667447611).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.3.3
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala>  val textFile = spark.read.textFile("README.md")   #是spark下的readme的一个文件
scala> textFile.count()
res0: Long = 103                    # 看到结果是103行

Modify the configuration file, also described the official line, just modify two configuration files 

Spark-env.sh modify configuration files

vim spark-env.sh
#添加以下内容
export JAVA_HOME=/opt/java
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
export SCALA_HOME=/opt/scala
export SPARK_MASTER_HOST=wyl01
export SPARK_MASTER_PORT=7077

Modify slaves profile

vim slaves   #添加work节点
wyl01
wyl02
wyl03

 In the master node spark installation package distributed to other machines

rsync -av spark spark-2.3.3-bin-hadoop2.7 wyl02:/opt/
rsync -av spark spark-2.3.3-bin-hadoop2.7 wyl03:/opt/

3.3.4 Start Service

Start service on the master node to note here hadoop cluster also start-all.sh, if all configured environment variables, here's our best to spark sbin directories start down better

[hadoop@wyl01 sbin]# ./start-all.sh

 

3.3.5 verification

To see whether the process started

Check whether the normal web page access

Run a verification mission

[hadoop@wyl01 sbin]# spark-submit   --class org.apache.spark.examples.SparkPi   --master spark://wyl01:7077   --deploy-mode cluster   --supervise   --executor-memory 512m   --total-executor-cores 2   /opt/spark/examples/jars/spark-examples_2.11-2.3.3.jar 100
Running Spark using the REST application submission protocol.
2019-05-24 17:09:40 INFO  RestSubmissionClient:54 - Submitting a request to launch an application in spark://wyl01:7077.
2019-05-24 17:09:51 WARN  RestSubmissionClient:66 - Unable to connect to server spark://wyl01:7077.
Warning: Master endpoint spark://wyl01:7077 was not a REST server. Falling back to legacy submission gateway instead.
2019-05-24 17:09:51 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

At this point, our spark has deployed a cluster

Guess you like

Origin blog.csdn.net/wyl9527/article/details/90517478