Cluster Four Steps (3): Perfect Spark Cluster Construction

    The Zookeeper and Hadoop clusters have been successfully built before. Let's go a step further and realize the construction of the Spark cluster. In comparison, it is much simpler to build a Spark cluster, and the key is that Hadoop has been successfully built. This time is based on the last Hadoop, because Spark relies on the distributed file system provided by Hadoop . Well, let's set sail!

    1. Environment: The virtual machine CentOs7 system, the complete environment, please confirm that the JDK, Hadoop and Spark installation packages have been installed, and the nodes still use the two cloned last time, and the environment for one of them is set up first.

    2. Spark configuration (not to mention decompression)

    Before configuration, say the following keywords: Master, Worker , let's understand.

    (1) Configure environment variables

vim /etc/profile

    Modify as follows:

JAVA_HOME=/usr/java/jdk1.8.0_161
JRE_HOME=/usr/java/jdk1.8.0_161/jre
SCALA_HOME=/usr/local/scala
HADOOP_HOME=/usr/local/hadoop
SPARK_HOME=/usr/local/spark
ZOOKEEPER_HOME=/usr/local/zookeeper
KAFKA_HOME=/usr/local/kafka
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$SCALA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$SPARK_HOME/sbin:$ZOOKEEPER_HOME/bin:$KAFKA_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export JAVA_HOME JRE_HOME SCALA_HOME HADOOP_HOME SPARK_HOME ZOOKEEPER_HOME KAFKA_HOME PATH CLASSPATH

    After the modification is completed, remember to run the command source to make it effective, copy it to the other two servers and perform the same operation, remember .

    (2) Configure the files in the conf directory

    First configure the spark-env.sh file, make a copy and rename it:

cp spark-env.sh.template spark-env.sh

    Edit the file and add the configuration (according to your own needs):

#!/usr/bin/env bash
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
export JAVA_HOME=/usr/java/jdk1.8.0_161
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export SPARK_HOME=/usr/local/spark

    Then configure the slaves file, make a copy and rename it:

cp slaves.template slaves

    Edit the file and configure (only add the server name of the datanode node):

slave02
slave03

    (3) Start and test the Spark cluster

    Because Spark relies on the distributed file system provided by Hadoop, make sure Hadoop is running properly before starting Spark. The Hadoop cluster has been successfully built before, so you can start it directly here:

#hadoop的/sbin目录下
./start-all.sh

    After startup, execute jps to see if it starts normally

(For reference here: https://my.oschina.net/u/3747963/blog/1636026 )

    Next start Spark:

#Spark /sbin目录下
./start-all.sh

    After startup, execute jps to check whether it starts normally, as follows:

[hadoop@slave01 sbin]$ jps
42657 Master
42004 SecondaryNameNode
42741 Jps
42182 ResourceManager
41768 NameNode

    Execute jps on slave02 and slave03 as follows:

[hadoop@slave02 conf]$ jps
15685 Worker
15238 DataNode
15756 Jps
15388 NodeManager

    As can be seen from the above, Spark has been successfully started, access the Master machine in the browser, that is, slave01, and visit http://slave01:8080:

        Well, the construction of the three clusters of big data has been completed. If you have any questions, please discuss together.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324430857&siteId=291194637