Spark learning road (5) Spark pseudo-distributed installation

1. Installation of JDK

JDK is installed using root user

1.1 Upload the installation package and extract it

[root@hadoop1 soft]# tar -zxvf jdk-8u73-linux-x64.tar.gz -C /usr/local/

1.2 Configure environment variables

[root@hadoop1 soft]# vi /etc/profile
#JAVA
export JAVA_HOME=/usr/local/jdk1.8.0_73
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin 

1.3 Verify the Java version

[root@hadoop1 soft]# java -version

Second, configure the configuration ssh localhost 

Install with hadoop user

2.1 Detection

Under normal circumstances, the machine also needs to enter a password to connect itself through ssh

2.2 Generate private key and public key pair

[hadoop@hadoop1 ~]$ ssh-keygen -t rsa

2.3 Add the public key to authorized_keys

[hadoop@hadoop1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

2.4 Give the authorized_keys file 600 permissions

[hadoop@hadoop1 ~]$ chmod 600 ~/.ssh/authorized_keys 

2.5 Modify the Linux mapping file (root user)

[root@hadoop1 ~]$ vi /etc/hosts

2.6 Verification

[hadoop@hadoop1 ~]$ ssh hadoop1

At this time, you do not need to enter a password, and the password-free login setting is successful.

3. Install Hadoop-2.7.5

use hadoop user

3.1 Upload and decompress

[hadoop@hadoop1 ~]$ tar -zxvf hadoop-2.7.5-centos-6.7.tar.gz -C apps/

3.2 Create a soft link corresponding to the installation package

Create a soft link for the unpacked hadoop package

[hadoop@hadoop1 ~]$ cd apps/
[hadoop@hadoop1 apps]$ ll
总用量 4
drwxr-xr-x. 9 hadoop hadoop 4096 12月 24 13:43 hadoop-2.7.5
[hadoop@hadoop1 apps]$ ln -s hadoop-2.7.5/ hadoop

3.3 Modify the configuration file

Enter the /home/hadoop/apps/hadoop/etc/hadoop/ directory to modify the configuration file

(1) Modify hadoop-env.sh

[hadoop@hadoop1 hadoop]$ vi hadoop-env.sh 
export JAVA_HOME=/usr/local/jdk1.8.0_73 

(2) Modify core-site.xml

[hadoop@hadoop1 hadoop]$ vi core-site.xml 
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop1:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/data/hadoopdata</value>
        </property>
</configuration>

(3) Modify hdfs-site.xml

[hadoop@hadoop1 hadoop]$ vi hdfs-site.xml 

The number of backups of dfs, only 1 copy for a single machine

        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/data/hadoopdata/name</value>
                <description>In order to ensure the security of metadata, multiple different directories are generally configured</description>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/hadoop/data/hadoopdata/data</value>
                <description>Datanode's data storage directory</description>
        </property>

        <property>
                <name>dfs.replication</name>
                <value>2</value>
                <description>Number of replica storage of data blocks in HDFS, the default is 3</description>
        </property>    

(4) Modify mapred-site.xml

[hadoop@hadoop1 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@hadoop1 hadoop]$ vi mapred-site.xml

mapreduce.framework.name: Specify the mr framework as the yarn mode, and the second-generation Hadoop MP also runs based on the resource management system Yarn.

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

(5) Modify yarn-site.xml

[hadoop@hadoop1 hadoop]$ vi yarn-site.xml
     <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
                <description>Shuffle service provided by YARN cluster for MapReduce program</description>
        </property>

3.4 Configure environment variables

Do pay attention to:

1. If you are using the root user to install. vi /etc/profile can be system variables

2. If you use a normal user to install. vi ~/.bashrc user variables

[hadoop@hadoop1 ~]$ vi .bashrc
#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.5 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

Make environment variables take effect

[hadoop@hadoop1 bin]$ source ~/.bashrc 

3.5 View hadoop version

[hadoop@hadoop1 ~]$ hadoop version

3.6 Create a folder

The path of the folder refers to the path in the configuration file hdfs-site.xml

[hadoop@hadoop1 ~]$ mkdir -p /home/hadoop/data/hadoopdata/name
[hadoop@hadoop1 ~]$ mkdir -p /home/hadoop/data/hadoopdata/data

3.7 Hadoop initialization

[hadoop@hadoop1 ~]$ hadoop namenode -format

3.8 Start HDFS and YARN

[hadoop@hadoop1 ~]$ start-dfs.sh
[hadoop@hadoop1 ~]$ start-yarn.sh

3.9 Checking the WebUI

Browser opens port 50070: http://hadoop1:50070

Other port descriptions: 
port 8088: cluster and all applications 
port 50070: Hadoop NameNode 
port 50090: Secondary NameNode 
port 50075: DataNode 

Fourth, the installation of Scala (optional)

Install with root

4.1 Download

Scala download address http://www.scala-lang.org/download/all.html

Select the corresponding version, which is installed on Linux here, and the selected version is scala-2.11.8.tgz

4.2 Upload and decompress

[root@hadoop1 hadoop]# tar -zxvf scala-2.11.8.tgz -C /usr/local/

4.3 Configure environment variables

[root@hadoop1 hadoop]# vi /etc/profile
#Scala
export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

Save and make it effective immediately

[root@hadoop1 scala-2.11.8]# source /etc/profile

4.4 Verify that the installation was successful

[root@hadoop1 ~]# scala -version

Five, Spark installation

5.1 Download the installation package

download link:

http://spark.apache.org/downloads.html

http://mirrors.hust.edu.cn/apache/

https://mirrors.tuna.tsinghua.edu.cn/apache/

5.2 Upload and decompress

[hadoop@hadoop1 ~]$ tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/

5.3 Create a soft link for the decompressed package

[hadoop@hadoop1 ~]$ cd apps/
[hadoop@hadoop1 apps]$ ls
hadoop  hadoop-2.7.5  spark-2.3.0-bin-hadoop2.7
[hadoop@hadoop1 apps]$ ln -s spark-2.3.0-bin-hadoop2.7/ spark

5.4 Enter spark/conf to modify the configuration file

[hadoop@hadoop1 apps]$ cd spark/conf/

 Copy spark-env.sh.template and rename it to spark-env.sh, and add configuration content at the end of the file

[hadoop@hadoop1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop1 conf]$ vi spark-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_73
export SCALA_HOME=/usr/share/scala-2.11.8
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.5
export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop-2.7.5/etc/hadoop
export SPARK_MASTER_IP=hadoop1
export SPARK_MASTER_PORT=7077

5.5 Configuring environment variables

[hadoop@hadoop1 conf]$ vi ~/.bashrc 
#SPARK_HOME
export SPARK_HOME=/home/hadoop/apps/spark
export PATH=$PATH:$SPARK_HOME/bin

save it to take effect immediately

[hadoop@hadoop1 conf]$ source ~/.bashrc

5.6 Start Spark

[hadoop@hadoop1 ~]$  ~/apps/spark/sbin/start-all.sh 

5.7 View process

5.8 Viewing the web interface

http://hadoop1:8080/

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324630151&siteId=291194637