50. Spark component deployment (MINI version)

Foreground connection:

Hadoop HA deployment (MINI version) https://blog.csdn.net/m0_54925305/article/details/121566611?spm=1001.2014.3001.5501 https://blog.csdn.net/m0_54925305/article/details/121566611?spm= 1001.2014.3001.5501

Environment preparation:

Numbering CPU name type user password
1 master1-1 master node root passwd
2 slave1-1 slave node root passwd
3 slave1-2 slave node root passwd

scala-2.11.8.tgz

spark-2.0.0-bin-hadoop2.7.tgz

        Note: The extraction code is: 0000

Environment deployment:

1. The Hadoop environment needs to be pre-installed, and check whether the Hadoop environment is available, take screenshots and save the results

        1. Use the jps command to view the cluster status

2. Unzip the scala installation package to the "/usr/local/src" path, rename it to scala, take a screenshot and save the result

        1. Enter the /h3cu/ directory to find the compressed package

cd /h3cu/

        2. Unzip scala 

tar -zxvf scala-2.11.8.tgz -C /usr/local/src

        3. Rename scala

mv scala-2.11.8 scala

3. Set the scala environment variable and make the environment variable take effect only for the current user, take a screenshot and save the result

        1. Add scala environment variable

vi /root/.bashrc

         2. Make the environment variable take effect immediately

source /root/.bashrc

4. Enter scala and take a screenshot, take a screenshot and save the result

        1. Enter the command scala to enter the scala interface

5. Unzip the Spark installation package to the "/usr/local/src" path, rename it to spark, take a screenshot and save the result

        1. Exit the scala interface

Use ctrl + c keys to exit the scala interface

        2. Enter the /h3cu/ directory to find Spark

cd /h3cu/

        3. Unzip Spark 

tar -zxvf spark-2.0.0-bin-hadoop2.7.tgz -C /usr/local/src/

        4. Rename Spark

mv spark-2.0.0-bin-hadoop2.7 spark

6. Set the Spark environment variable and make the environment variable take effect only for the current user, take a screenshot and save the result

        1. Add Spark environment variables

vi /root/.bashrc

        2. Make the environment variable take effect immediately 

source /root/.bashrc

7. Modify the Spark parameter configuration, specify the Spark slave node, take screenshots and save the results

        1. Enter the /usr/local/src/spark/conf directory

cd /usr/local/src/spark/conf

        2. Create a new slaves file and write it

vi slaves

        Note: The content of this file should not contain many useless spaces or other characters, and strictly abide by the specifications 

        3. Create a new spark-env.sh file and write it

vi spark-env.sh
export JAVA_HOME=/usr/local/src/jdk1.8.0_221
export HADOOP_HOME=/usr/local/hadoop
export SCALA_HOME=/usr/local/src/scala
export SPARK_MASTER_IP=master1-1
export SPARK_MASTER_PORT=7077
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_YARN_USER_ENV="CLASSPATH=/usr/local/hadoop/etc/hadoop"
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master1-1:2181,slave1-1:2181,slave1-2:2181 -Dspark.deploy.zookeeper.dir=/spark"

        Note: Among them, the meanings of the three parameters are: SPARK_DIST_CLASSPATH is the connection between spark and hadoop, HADOOP_CONF_DIR is the directory that describes the configuration information of hadoop, SPARK_MASTER_IP is the IP address or name of the master node in the cluster

        4. Cluster distribution

scp -r /usr/local/src/spark slave1-1:/usr/local/src/
scp -r /usr/local/src/spark slave1-2:/usr/local/src/
scp -r /root/.bashrc slave1-1:/root/.bashrc
scp -r /root/.bashrc slave1-2:/root/.bashrc

        5. Make sure all machine environment variables have taken effect

source /root/.bashrc

        Note: All three machines need to be executed

8. Start Spark, and use the command to view the webUI results, take screenshots and save the results

        1. Enter the spark installation directory to start spark

sbin/start-all.sh

         Note: Make sure zookeeper has been started normally

        2. Enter master1-1:8080 in the browser to view the web UI

        3. Start the master on the slave side 

sbin/start-master.sh

        Note: Through observation, it can be seen that the master state of the master node is active, and the master state of the slave node is the standby state, that is, the cluster is running successfully

Spark component deployment (MINI version) completed


What can't beat you will make you stronger!

Guess you like

Origin blog.csdn.net/m0_54925305/article/details/121615781