table of Contents
JDK basic environment installation
Update environment configuration
Hadoop environment installation
Download and unzip Hadoop3.2.2
Update environment configuration
Hadoop core configuration file settings
Hadoop hdfs core configuration start-dfs.sh and stop-dfs.sh
Hadoop yarn core configuration start-yarn.sh and stop-yarn.sh
SSH password-free login settings
Spark environment installation
Complete environment configuration
Basic environment description
System environment: centos8
Host name: www.boonya.cn
vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 www.boonya.cn boonya.cn
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
JDK basic environment installation
Download and unzip jdk8
cd /usr/local
wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u141-b15/336fa29ff2bb4ef291e347e091f7f4a7/jdk-8u141-linux-x64.tar.gz"
tar -zxvf jdk-8u141-linux-x64.tar.gz
mv jdk1.8.0_141 jdk
Set environment variables
##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
Update environment configuration
source /etc/profile
Hadoop environment installation
This article takes the installation of Hadoop pseudo-distribution as an example
Download and unzip Hadoop3.2.2
cd /usr/local
wget https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
tar -zxvf hadoop-3.2.2.tar.gz
Set environment variables
#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Update environment configuration
source /etc/profile
Set Hadoop JAVA_HOME
#hadoop-env.sh
vi hadoop-env.sh
export JAVA_HOME=/usr/local/jdk
Hadoop core configuration file settings
#core-site.xml
<!-- 制定HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://www.boonya.cn:9000</value>
</property>
<!-- 指定hadoop运行时产生文件的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-3.2.2/tmp</value>
</property>
#hdfs-site.xml
<!-- 指定HDFS副本的数量 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
# mapred-site.xml
<!-- 指定mr运行在yarn上 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
#yarn-site.xml
<!-- 指定YARN的老大(ResourceManager)的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>www.boonya.cn</value>
</property>
<!-- reducer获取数据的方式 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Hadoop hdfs core configuration start-dfs.sh and stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
Hadoop yarn core configuration start-yarn.sh and stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
SSH password-free login settings
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Start Hadoop
jps process view
Hdfs and cluster access
Hadoop 3.0 dfs access address port is 9870 (50070 for the old version):
Note: here you need to open the firewall to add the response hdfs port 9870, otherwise the outside cannot be accessed, you need to close the firewall after setting up (8088 is the cluster management port).
The following are the reasons why 8088 cannot be accessed from outside:
Shut down Hadoop
Spark environment installation
Spark depends on scala, so you need to install scala first.
scala installation
Scala versions: https://www.scala-lang.org/download/all.html
Take 2.12.13 as an example:
cd /usr/local
wget https://downloads.lightbend.com/scala/2.12.13/scala-2.12.13.tgz
tar -zxvf scala-2.12.13.tgz
Set environment variables
##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH
Update environment configuration
source /etc/profile
spark installation
Each version of spark: https://archive.apache.org/dist/spark/
Download and unzip spark
wget https://archive.apache.org/dist/spark/spark-3.0.0/spark-3.0.0-bin-hadoop3.2.tgz
tar -zxvf spark-3.0.0-bin-hadoop3.2.tgz
Set environment variables
#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
Set startup mode
$ mv spark-defaults.conf.template spark-defaults.conf
$ mv slaves.template slaves
$ mv spark-env.sh.template spark-env.sh
#修改spark-defaults.conf启用yarn模式
spark.master yarn
Set spark JAVA_HOME, modify spark-config.sh
export JAVA_HOME=/usr/local/jdk
Start execution: sbin/start-all.sh
[root@www sbin]# start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.master.Master-1-www.boonya.cn.out
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
[root@www sbin]# tail -f -n 200 /usr/local/spark-3.0.0-bin-hadoop3.2/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-www.boonya.cn.out
Spark Command: /usr/local/jdk/bin/java -cp /usr/local/spark-3.0.0-bin-hadoop3.2/conf/:/usr/local/spark-3.0.0-bin-hadoop3.2/jars/* -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://localhost:7077
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/02/17 14:04:18 INFO Worker: Started daemon with process name: [email protected]
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for TERM
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for HUP
21/02/17 14:04:18 INFO SignalUtils: Registered signal handler for INT
21/02/17 14:04:18 WARN Utils: Your hostname, www.boonya.cn resolves to a loopback address: 127.0.0.1; using 192.168.0.120 instead (on interface enp0s3)
21/02/17 14:04:18 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/17 14:04:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/02/17 14:04:20 INFO SecurityManager: Changing view acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls to: root
21/02/17 14:04:20 INFO SecurityManager: Changing view acls groups to:
21/02/17 14:04:20 INFO SecurityManager: Changing modify acls groups to:
21/02/17 14:04:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/02/17 14:04:22 INFO Utils: Successfully started service 'sparkWorker' on port 45731.
21/02/17 14:04:23 INFO Worker: Starting Spark worker 192.168.0.120:45731 with 1 cores, 1024.0 MiB RAM
21/02/17 14:04:23 INFO Worker: Running Spark version 3.0.0
21/02/17 14:04:23 INFO Worker: Spark home: /usr/local/spark-3.0.0-bin-hadoop3.2
21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO ResourceUtils: Resources for spark.worker:
21/02/17 14:04:23 INFO ResourceUtils: ==============================================================
21/02/17 14:04:23 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
21/02/17 14:04:23 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://192.168.0.120:8081
21/02/17 14:04:23 INFO Worker: Connecting to master localhost:7077...
21/02/17 14:04:24 INFO TransportClientFactory: Successfully created connection to localhost/127.0.0.1:7077 after 194 ms (0 ms spent in bootstraps)
21/02/17 14:04:24 INFO Worker: Successfully registered with master spark://localhost:7077
Spark UI management interface address: http://192.168.0.120:8081 (it has been given in the above log)
Spark sample program
Github spark project: https://github.com/open-micro-services/springcloud/tree/master/demo-projects/sb-spark
The above is the completion log of the sample program. If there is a problem with the startup, please adjust the parameter configuration.
Integrate spark UI 4040 port service:
Complete environment configuration
##Java home
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
#hadoop home
export HADOOP_HOME=/usr/local/hadoop-3.2.2
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
##scala home
export SCALA_HOME=/usr/local/scala-2.12.13
export PATH=.:$SCALA_HOME/bin:$PATH
#spark home
export SPARK_HOME=/usr/local/spark-3.0.0-bin-hadoop3.2
export PATH=$PATH:$SPARK_HOME/bin
Reference article: