Apache Hadoop3.2.2 and Spark3.0.0 environment installation

table of Contents

Basic environment description

JDK basic environment installation

Download and unzip jdk8

Set environment variables

Update environment configuration

Hadoop environment installation

Download and unzip Hadoop3.2.2

 Set environment variables

Update environment configuration

Set Hadoop JAVA_HOME

Hadoop core configuration file settings

Hadoop hdfs core configuration start-dfs.sh and stop-dfs.sh

 Hadoop yarn core configuration start-yarn.sh and stop-yarn.sh

SSH password-free login settings

Start Hadoop

jps process view

Hdfs and cluster access

Shut down Hadoop

Spark environment installation

scala installation

spark installation

Spark sample program

Complete environment configuration


Basic environment description

System environment: centos8

Host name: www.boonya.cn

vi /etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4 www.boonya.cn boonya.cn
::1   localhost localhost.localdomain localhost6 localhost6.localdomain6


 

JDK basic environment installation

Download and unzip jdk8

cd /usr/local

wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz"

tar -zxvf jdk-8u141-linux-x64.tar.gz

mv jdk1.8.0_141 jdk

Set environment variables

##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

Update environment configuration

source /etc/profile

Hadoop environment installation

This article takes the installation of Hadoop pseudo-distribution as an example

Download and unzip Hadoop3.2.2

cd /usr/local

wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz

tar -zxvf hadoop-3.2.2.tar.gz

 Set environment variables

#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Update environment configuration

source /etc/profile

Set Hadoop JAVA_HOME

#hadoop-env.sh
vi hadoop-env.sh

export JAVA_HOME=/usr/local/jdk

Hadoop core configuration file settings

#core-site.xml
        <!-- 制定HDFS的老大(NameNode)的地址 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://www.boonya.cn:9000</value>
        </property>
        <!-- 指定hadoop运行时产生文件的存储目录 -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/local/hadoop-3.2.2/tmp</value>
        </property>
        
#hdfs-site.xml
        <!-- 指定HDFS副本的数量 -->
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        
# mapred-site.xml

        <!-- 指定mr运行在yarn上 -->
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        
#yarn-site.xml
        <!-- 指定YARN的老大(ResourceManager)的地址 -->
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>www.boonya.cn</value>
        </property>
        <!-- reducer获取数据的方式 -->
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>

Hadoop hdfs core configuration start-dfs.sh and stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

 Hadoop yarn core configuration start-yarn.sh and stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root

SSH password-free login settings

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

chmod 0600 ~/.ssh/authorized_keys

Start Hadoop

jps process view

Hdfs and cluster access

Hadoop 3.0 dfs access address port is 9870 (50070 for the old version):

Note: here you need to open the firewall to add the response hdfs port 9870, otherwise the outside cannot be accessed, you need to close the firewall after setting up (8088 is the cluster management port).

The following are the reasons why 8088 cannot be accessed from outside:

Shut down Hadoop

Spark environment installation

Spark depends on scala, so you need to install scala first.

scala installation

Scala versions: https://www.scala-lang.org/download/all.html

Take 2.12.13 as an example:

cd /usr/local

wget https://downloads.lightbend.com/scala/2.12.13/scala-2.12.13.tgz

tar -zxvf scala-2.12.13.tgz

 Set environment variables

##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH

Update environment configuration

source /etc/profile

spark installation

Each version of spark: https://archive.apache.org/dist/spark/

Download and unzip spark

wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz

tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz

 Set environment variables

#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin

 Set startup mode

$ mv spark-defaults.conf.template spark-defaults.conf
$ mv slaves.template slaves
$ mv spark-env.sh.template spark-env.sh
 
 
#修改spark-defaults.conf启用yarn模式
spark.master     yarn

Set spark JAVA_HOME, modify spark-config.sh

export JAVA_HOME=/usr/local/jdk

Start execution: sbin/start-all.sh

[root@www sbin]# start-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-www.boonya.cn.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
[root@www sbin]# tail -f -n 200 /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
Spark Command: /usr/local/jdk/bin/java -cp /usr/local/spark-3.0.0-bin-hadoop3.2/conf/:/usr/local/spark-3.0.0-bin-hadoop3.2/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/17 14:04:18 INFO Worker: Started daemon with process name: [email protected]
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for TERM
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for HUP
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for INT
21/02/17 14:04:18 WARN Utils: Your hostname, www.boonya.cn resolves to a loopback address: 127.0.0.1; using 192.168.0.120 instead (on interface enp0s3)
21/02/17 14:04:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/17 14:04:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/17 14:04:20 INFO SecurityManager: Changing view acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing view acls groups to: 
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls groups to: 
21/02/17 14:04:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/02/17 14:04:22 INFO Utils: Successfully started service 'sparkWorker' on port 45731.
21/02/17 14:04:23 INFO Worker: Starting Spark worker 192.168.0.120:45731 with 1 cores, 1024.0 MiB RAM
21/02/17 14:04:23 INFO Worker: Running Spark version 3.0.0
21/02/17 14:04:23 INFO Worker: Spark home: /usr/local/spark-3.0.0-bin-hadoop3.2
21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO ResourceUtils: Resources for spark.worker:

21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/02/17 14:04:23 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.0.120:8081
21/02/17 14:04:23 INFO Worker: Connecting to master localhost:7077...
21/02/17 14:04:24 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:7077 after 194 ms (0 ms spent in bootstraps)
21/02/17 14:04:24 INFO Worker: Successfully registered with master spark://localhost:7077

Spark UI management interface address: http://192.168.0.120:8081 (it has been given in the above log)

Spark sample program

Github spark project: https://github.com/open-micro-services/springcloud/tree/master/demo-projects/sb-spark

The above is the completion log of the sample program. If there is a problem with the startup, please adjust the parameter configuration.

Integrate spark UI 4040 port service:

Complete environment configuration

##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH

#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin

Reference article:

Centos8 firewall settings

Installation of Spark operating environment

Hadoop 3.2.0 installation

Guess you like

Origin blog.csdn.net/boonya/article/details/113833831