Spark learning road (e) mounting a distributed pseudo Spark [rpm]

JDK installation

JDK root user installation

Upload install and unzip

[root@hadoop1 soft]# tar -zxvf jdk-8u73-linux-x64.tar.gz -C /usr/local/

Configuration environment variable

[root@hadoop1 soft]# vi /etc/profile
#JAVA
export JAVA_HOME=/usr/local/jdk1.8.0_73
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib 
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin 

Verify Java version

[root@hadoop1 soft]# java -version

Configuring ssh localhost

Detect

Under normal circumstances, the machine is connected to its own need to enter a password by ssh

Private and public keys generated secret key pair

[hadoop@hadoop1 ~]$ ssh-keygen -t rsa

Add the public key to the authorized_keys

[hadoop@hadoop1 ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

Authorized_keys file permissions given to 600

[hadoop@hadoop1 ~]$ chmod 600 ~/.ssh/authorized_keys 

Modify Linux mapping file (root user)

[root@hadoop1 ~]$ vi /etc/hosts

verification

[hadoop@hadoop1 ~]$ ssh hadoop1

At this point do not need to enter a password, secret login successfully set free.

Hadoop installation

Use hadoop user

Upload decompression

[hadoop@hadoop1 ~]$ tar -zxvf hadoop-2.7.5-centos-6.7.tar.gz -C apps/

Create a corresponding installation package flexible connector

Create a soft link to unzip the package hadoop

[hadoop@hadoop1 ~]$ cd apps/
[hadoop@hadoop1 apps]$ ll
总用量 4
drwxr-xr-x. 9 hadoop hadoop 4096 12月 24 13:43 hadoop-2.7.5
[hadoop@hadoop1 apps]$ ln -s hadoop-2.7.5/ hadoop

Modify the configuration file

Modify the configuration file into the / home / hadoop / apps / hadoop / etc / hadoop / directory

(1) Modify hadoop-env.sh

[hadoop@hadoop1 hadoop]$ vi hadoop-env.sh 
export JAVA_HOME=/usr/local/jdk1.8.0_73 

(2) modified core-site.xml

[hadoop@hadoop1 hadoop]$ vi core-site.xml 
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop1:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/data/hadoopdata</value>
        </property>
</configuration>

(3) modify hdfs-site.xml

[hadoop@hadoop1 hadoop]$ vi hdfs-site.xml 

Dfs number of backup, a single part with the line

<property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/data/hadoopdata/name</value>
                <description>为了保证元数据的安全一般配置多个不同目录</description>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/hadoop/data/hadoopdata/data</value>
                <description>datanode 的数据存储目录</description>
        </property>

        <property>
                <name>dfs.replication</name>
                <value>2</value>
                <description>HDFS 的数据块的副本存储个数, 默认是3</description>
        </property>

(4) modified mapred-site.xml

[hadoop@hadoop1 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[hadoop@hadoop1 hadoop]$ vi mapred-site.xml

mapreduce.framework.name: mr specified framework yarn ways, Hadoop to run the second-generation MP also based resource management system Yarn.

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

(5) to modify yarn-site.xml

[hadoop@hadoop1 hadoop]$ vi yarn-site.xml 
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
                <description>YARN 集群为 MapReduce 程序提供的 shuffle 服务</description>
        </property>

Configuration environment variable

Be careful:

1, if you use the root user to install. vi / etc / profile to system variables

2, if you use an ordinary user to install. vi ~ / .bashrc user variables

#HADOOP_HOME
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.5
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:

The environment variables to take effect

[hadoop@hadoop1 bin]$ source ~/.bashrc 

View hadoop version

[hadoop@hadoop1 ~]$ hadoop version

Create a folder

The reference path to the folder configuration file path inside hdfs-site.xml

[hadoop@hadoop1 ~]$ mkdir -p /home/hadoop/data/hadoopdata/name
[hadoop@hadoop1 ~]$ mkdir -p /home/hadoop/data/hadoopdata/data

Hadoop initialization

[hadoop@hadoop1 ~]$ hadoop namenode -format

HDFS and start YARN

[hadoop@hadoop1 ~]$ start-dfs.sh
[hadoop@hadoop1 ~]$ start-yarn.sh

Check the WebUI

Browser to open port 50070: http: // hadoop1: 50070

其他端口说明:
port 8088: cluster and all applications
port 50070: Hadoop NameNode
port 50090: Secondary NameNode
port 50075: DataNode

Scala的安装(可选)

使用root安装

下载

Scala下载地址http://www.scala-lang.org/download/all.html

选择对应的版本,此处在Linux上安装,选择的版本是scala-2.11.8.tgz

上传解压缩

[root@hadoop1 hadoop]# tar -zxvf scala-2.11.8.tgz -C /usr/local/

配置环境变量

[root@hadoop1 hadoop]# vi /etc/profile
#Scala
export SCALA_HOME=/usr/local/scala-2.11.8
export PATH=$SCALA_HOME/bin:$PATH

保存并使其立即生效

[root@hadoop1 scala-2.11.8]# source /etc/profile

验证是否安装成功

[root@hadoop1 ~]# scala -version

Spark的安装

下载安装包

下载地址:

http://spark.apache.org/downloads.html

http://mirrors.hust.edu.cn/apache/

https://mirrors.tuna.tsinghua.edu.cn/apache/

上传解压缩

[hadoop@hadoop1 ~]$ tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C apps/

为解压包创建一个软连接

[hadoop@hadoop1 ~]$ cd apps/
[hadoop@hadoop1 apps]$ ls
hadoop  hadoop-2.7.5  spark-2.3.0-bin-hadoop2.7
[hadoop@hadoop1 apps]$ ln -s spark-2.3.0-bin-hadoop2.7/ spark

进入spark/conf修改配置文件

[hadoop@hadoop1 apps]$ cd spark/conf/

复制spark-env.sh.template并重命名为spark-env.sh,并在文件最后添加配置内容

[hadoop@hadoop1 conf]$ cp spark-env.sh.template spark-env.sh
[hadoop@hadoop1 conf]$ vi spark-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_73
export SCALA_HOME=/usr/share/scala-2.11.8
export HADOOP_HOME=/home/hadoop/apps/hadoop-2.7.5
export HADOOP_CONF_DIR=/home/hadoop/apps/hadoop-2.7.5/etc/hadoop
export SPARK_MASTER_IP=hadoop1
export SPARK_MASTER_PORT=7077

配置环境变量

[hadoop@hadoop1 conf]$ vi ~/.bashrc 
#SPARK_HOME
export SPARK_HOME=/home/hadoop/apps/spark
export PATH=$PATH:$SPARK_HOME/bin

保存使其立即生效

[hadoop@hadoop1 conf]$ source ~/.bashrc

启动Spark

[hadoop@hadoop1 ~]$  ~/apps/spark/sbin/start-all.sh 

查看进程

查看web界面

http://hadoop1:8080/

Guess you like

Origin www.cnblogs.com/cjunn/p/12232181.html