Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Access Spark's official website, read the installation process Spark found Spark need to use hadoop, Java JDK, of course, the official website also provides a version of Hadoop free. This article is from Java JDK installation began, the gradual completion of stand-alone installation of Spark.

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

1, Java JDK8 installation

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Package into a directory after the download the next, here on the / opt / java directory

Use the command: tar -zxvf jdk-8u231-linux-x64.tar.gz decompression

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Modify the configuration file / etc / profile, use the command: sudo nano / etc / profile

Add the following to the end of the file (specific path based environment):

export JAVA_HOME=/opt/java/jdk1.8.0_231
export JRE_HOME=/opt/java/jdk1.8.0_231/jre
export PATH=${JAVA_HOME}/bin:$PATH

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Save and exit, use the command in a terminal interface: source / etc / profile for the configuration file to take effect.

Use java -version verify the installation, the following echoing the installation was successful.

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

2, installation of Hadoop

Go to the official website https://hadoop.apache.org/releases.html download hadoop, here select the version 2.7.7

http://www.apache.org/dist/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz

hadoop need to avoid ssh secret landing and other functions, and thus install ssh.

Use the command:

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

The download package into a directory, here in / opt / hadoop

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Use the command: tar -zxvf hadoop-2.7.7.tar.gz decompress

Here selected pseudo-distributed installation (Pseudo-Distributed)

Modify files in the subdirectory unpacked directory etc / hadoop / hadoop-env.sh, the JAVA_HOME JAVA_HOME path to modify the path of the present machine, as shown below:

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Hadoop configuration environment variable

Use the command:

Add the following:

export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7

Modify the PATH variable, add the bin directory into hadoop

export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Etc subdirectory under the directory modified decompression / hadoop / core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

As shown below:

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Etc subdirectory under the directory modified decompression / hadoop / hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

As shown below:

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Set free secret landing

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Use the command: ssh localhost verification is successful, if not need to enter a password to login shows a success.

 * Documentation:  https://help.ubuntu.com
 * Management:    https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage


 * Canonical Livepatch is available for installation.
  - Reduce system reboots and improve kernel security. Activate at:
    https://ubuntu.com/livepatch

188 can be upgraded package.
0 security update.

Your Hardware Enablement Stack (HWE) is supported until April 2023.
Last login: Sat Nov 30 23:25:35 2019 from 127.0.0.1

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Next you need to verify the installation of Hadoop

a, file system format

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

b, and start Namenode Datanode

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

c, a browser to access http: // localhost: 50070

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

3, Scala installation:

Download: https: //www.scala-lang.org/download/2.11.8.html

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Once you have downloaded extract to: / opt / scala

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Configuration environment variable:

Add to:

export SCALA_HOME = / opt / scale / scale-2.11.8

 Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

source /etc/profile

4, install spark

Go to the official website to download spark spark

https://spark.apache.org/downloads.html

Here select the version as follows:

spark-2.4.4-bin-hadoop2.7

The spark into a directory, here in / opt / spark

Use the command: tar -zxvf spark-2.4.0 bin hadoop2.7.tgz--unzip

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Use the command: ./bin/run-example SparkPi 10 test spark of installation

Configuration environment variable SPARK_HOME

export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SPARK_HOME}/bin:$PATH

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

source /etc/profile

Configuration spark-env.sh

Into the spark / conf /

sudo cp /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env .sh

export JAVA_HOME=/opt/java/jdk1.8.0_231
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.7.7/etc/hadoop
export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
export SCALA_HOME=/opt/scala/scala-2.11.8
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8099
export SPARK_WORKER_CORES=3
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=5G
export SPARK_WORKER_WEBUI_PORT=8081
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$HADOOP_HOME/lib/native

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

Java, Hadoop and other specific path according to their actual environment settings.

spark-shell at the start bin directory

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

I can see has entered into scala environment, then you can write code for it.

spark-shell web interface http://127.0.0.1:4040

Hadoop and Spark to build stand-alone cluster environment Ubuntu 18.04

This being the first, if there are any questions, please ask in the comments section below Linux commune.

Guess you like

Origin www.linuxidc.com/Linux/2019-12/161628.htm