Note: The Hadoop cluster three virtual machines, host names are: hadoop01, hadoop02, hadoop03
A, Hadoop cluster installation
1. standard operation, create a folder:
cd /export/
cd /export/data/
cd /export/servers/
cd /export/software/
2. Download the JDK, Hadoop:
JDK:https://www.oracle.com/technetwork/java/javase/downloads/index.html
Hadoop:https://hadoop.apache.org/releases.html
Installation package into / export / software / directory
3. Install JDK, Hadoop:
cd /export/software/
tar -zxvf (jdk) -C /export/servers/
tar -zxvf (Hadoop) -C /export/servers/
4. easy to operate, rename JDK, Hadoop:
cd /export/servers/
mv {JDK}/ jdk
mv {Hadoop}/ hadoop
5. Configure JDK, Hadoop environment variables:
vi / etc / profile
export JAVA_HOME=/export/servers/jdk
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/export/servers/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
6. Restart, non restart to take effect:
Reboot: reboot
Non-restart: source / etc / profile
Two, Hadoop cluster configuration
Solutions of the master node 1. Enter hadoop compression / etc / hadoop the packet / directory
2. Modify hadoop-env.sh file :( JDK environment variables required to run Hadoop setting; the purpose is to allow Hadoop daemons can execute at startup)
we /hadoop-env.sh
export JAVA_HOME=/export/servers/jdk
3.core-site.xml configuration file :( HDFS master node position of the main process NameNode run the host, Hadoop clusters; generating a temporary directory data when configuring Hadoop run)
we /core-site.xml
<configuration>
<! - Hadoop for setting file system specified by the URI ->
<Property>
<name> fs.defaultFS </ name>
<! - specifies the address on namenode hadoop01 machine ->
<value> HDFS: // hadoop01: 9000 </ value>
</ Property>
<! - Hadoop configuration of the temporary directory, the default / tmp / Hadoop - the user.name} {$ ->
<Property>
<name> hadoop.tmp.dir </ name>
<value> / Export / Servers / Hadoop / tmp </ value>
</property>
</configuration>
4. Modify hdfs-site.xml configuration file :( HDFS number of copies of data blocks, the default value is 3; HTTP protocol is provided where the service URL Secondary NameNode)
we /hdfs-site.xml
<configuration>
<!--指定HDFS副本的数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<-! IP and port of the host where the secondary namenode ->
<Property>
<name> dfs.namenode.secondary.http-address </ name>
<value> hadoop02: 50090 </ value>
</ Property>
</ configuration>
5. Modify mapred-site.xml file :( MapReduce Hadoop specified operating framework for YARN)
cp mapred-site.xml.template mapred-site.xml
we /mapred-site.xml
<configuration>
<! - Specifies MapReduce runtime framework specified herein on YARN, default local ->
<Property>
<name> mapreduce.framework.name </ name>
<value> Yarn </ value>
</ Property>
< / configuration>
6. Modify the yarn-site.xml configuration file :( YARN ResourceManager master process running host hadoop01; ancillary services to configure NodeManager run time, you need to configure the default program mapreduce_shuffle to run MapReduce normal)
we /yarn-site.xml
<configuration>
<-! YARN managers designated cluster (ResourceManager) address ->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
7. Modify the slaves file :( records from all Hadoop cluster host name of the node, with a key to start; need to delete the default content)
we / slave
hadoop01
hadoop02
hadoop03
8. The master node cluster configuration file distributed to other child nodes:
scp /etc/profile hadoop02:/etc/profile
scp /etc/profile hadoop03:/etc/profile
scp -r /export/ hadoop02:/
scp -r /export/ hadoop03:/
9. The child nodes are the refresh instruction:
source /etc/profile
Three, Hadoop cluster test
1. Format File System:
hdfs namenode -format
OR
Hadoop namenode -format
2. startup and shutdown Hadoop cluster: one by one single node startup and shutdown
(1) the primary node startup and shutdown process HDFS NameNode
hadoop-daemon.sh start namenode
hadoop-daemon.sh stop namenode
(2) startup and shutdown process from node HDFS DataNode
hadoop-daemon.sh start datanode
hadoop-daemon.sh stop datanode
(3) the primary node startup and shutdown process YARN ResourceManager
yarn-deamon.sh start resourcemanager
yarn-deamon.sh stop resourcemanager
(4) startup and shutdown process from node YARN NodeManager
yarn-deamon.sh start nodemanager
yarn-deamon.sh stop nodemanager
(5) planning hadoop02 node startup and shutdown process SecondaryNameNode
hadoop-daemon.sh start secondarynamenode
hadoop-daemon.sh stop secondarynamenode
3. Startup and Shutdown Hadoop cluster: a key to start the startup and shutdown
(1) the primary node startup and shutdown processes all HDFS service
start-dfs.sh
stop-dfs.sh
(2) the primary node startup and shutdown of all service processes YARN
start-yarn.sh
stop-yarn.sh