Access Spark's official website, read the installation process Spark found Spark need to use hadoop, Java JDK, of course, the official website also provides a version of Hadoop free. This article is from Java JDK installation began, the gradual completion of stand-alone installation of Spark.
1, Java JDK8 installation
https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
Package into a directory after the download the next, here on the / opt / java directory
Use the command: tar -zxvf jdk-8u231-linux-x64.tar.gz decompression
Modify the configuration file / etc / profile, use the command: sudo nano / etc / profile
Add the following to the end of the file (specific path based environment):
export JAVA_HOME=/opt/java/jdk1.8.0_231
export JRE_HOME=/opt/java/jdk1.8.0_231/jre
export PATH=${JAVA_HOME}/bin:$PATH
Save and exit, use the command in a terminal interface: source / etc / profile for the configuration file to take effect.
Use java -version verify the installation, the following echoing the installation was successful.
2, installation of Hadoop
Go to the official website https://hadoop.apache.org/releases.html download hadoop, here select the version 2.7.7
http://www.apache.org/dist/hadoop/core/hadoop-2.7.7/hadoop-2.7.7.tar.gz
hadoop need to avoid ssh secret landing and other functions, and thus install ssh.
Use the command:
The download package into a directory, here in / opt / hadoop
Use the command: tar -zxvf hadoop-2.7.7.tar.gz decompress
Here selected pseudo-distributed installation (Pseudo-Distributed)
Modify files in the subdirectory unpacked directory etc / hadoop / hadoop-env.sh, the JAVA_HOME JAVA_HOME path to modify the path of the present machine, as shown below:
Hadoop configuration environment variable
Use the command:
Add the following:
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
Modify the PATH variable, add the bin directory into hadoop
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
Etc subdirectory under the directory modified decompression / hadoop / core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
As shown below:
Etc subdirectory under the directory modified decompression / hadoop / hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
As shown below:
Set free secret landing
Use the command: ssh localhost verification is successful, if not need to enter a password to login shows a success.
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
* Canonical Livepatch is available for installation.
- Reduce system reboots and improve kernel security. Activate at:
https://ubuntu.com/livepatch
188 can be upgraded package.
0 security update.
Your Hardware Enablement Stack (HWE) is supported until April 2023.
Last login: Sat Nov 30 23:25:35 2019 from 127.0.0.1
Next you need to verify the installation of Hadoop
a, file system format
b, and start Namenode Datanode
c, a browser to access http: // localhost: 50070
3, Scala installation:
Download: https: //www.scala-lang.org/download/2.11.8.html
Once you have downloaded extract to: / opt / scala
Configuration environment variable:
Add to:
export SCALA_HOME = / opt / scale / scale-2.11.8
source /etc/profile
4, install spark
Go to the official website to download spark spark
https://spark.apache.org/downloads.html
Here select the version as follows:
spark-2.4.4-bin-hadoop2.7
The spark into a directory, here in / opt / spark
Use the command: tar -zxvf spark-2.4.0 bin hadoop2.7.tgz--unzip
Use the command: ./bin/run-example SparkPi 10 test spark of installation
Configuration environment variable SPARK_HOME
export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${SPARK_HOME}/bin:$PATH
source /etc/profile
Configuration spark-env.sh
Into the spark / conf /
sudo cp /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env.sh.template /opt/spark/spark-2.4.4-bin-hadoop2.7/conf/spark-env .sh
export JAVA_HOME=/opt/java/jdk1.8.0_231
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HADOOP_CONF_DIR=/opt/hadoop/hadoop-2.7.7/etc/hadoop
export SPARK_HOME=/opt/spark/spark-2.4.4-bin-hadoop2.7
export SCALA_HOME=/opt/scala/scala-2.11.8
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8099
export SPARK_WORKER_CORES=3
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=5G
export SPARK_WORKER_WEBUI_PORT=8081
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1G
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:$HADOOP_HOME/lib/native
Java, Hadoop and other specific path according to their actual environment settings.
spark-shell at the start bin directory
I can see has entered into scala environment, then you can write code for it.
spark-shell web interface http://127.0.0.1:4040
This being the first, if there are any questions, please ask in the comments section below Linux commune.