Hadoop cluster (installation, configuration, test)

Note: The Hadoop cluster three virtual machines, host names are: hadoop01, hadoop02, hadoop03

A, Hadoop cluster installation

1. standard operation, create a folder:

cd /export/

cd /export/data/

cd /export/servers/

cd /export/software/

 

2. Download the JDK, Hadoop:

JDK:https://www.oracle.com/technetwork/java/javase/downloads/index.html

Hadoop:https://hadoop.apache.org/releases.html

Installation package into / export / software / directory

 

3. Install JDK, Hadoop:

cd /export/software/

tar -zxvf (jdk) -C /export/servers/

tar -zxvf (Hadoop) -C /export/servers/

 

4. easy to operate, rename JDK, Hadoop:

cd /export/servers/

mv {JDK}/ jdk

mv {Hadoop}/ hadoop

 

5. Configure JDK, Hadoop environment variables:

vi / etc / profile

  export JAVA_HOME=/export/servers/jdk
  export PATH=$PATH:$JAVA_HOME/bin
  export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

  export HADOOP_HOME=/export/servers/hadoop
  export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

6. Restart, non restart to take effect:

Reboot: reboot

Non-restart: source / etc / profile

 

 

Two, Hadoop cluster configuration

Solutions of the master node 1. Enter hadoop compression / etc / hadoop the packet / directory

 

2. Modify hadoop-env.sh file :( JDK environment variables required to run Hadoop setting; the purpose is to allow Hadoop daemons can execute at startup)

we /hadoop-env.sh

  export JAVA_HOME=/export/servers/jdk

 

3.core-site.xml configuration file :( HDFS master node position of the main process NameNode run the host, Hadoop clusters; generating a temporary directory data when configuring Hadoop run)

we /core-site.xml

  <configuration>

    <! - Hadoop for setting file system specified by the URI ->  
    <Property>
      <name> fs.defaultFS </ name>

      <! - specifies the address on namenode hadoop01 machine ->
      <value> HDFS: // hadoop01: 9000 </ value>
    </ Property>

    <! - Hadoop configuration of the temporary directory, the default / tmp / Hadoop - the user.name} {$ ->
    <Property>
      <name> hadoop.tmp.dir </ name>
      <value> / Export / Servers / Hadoop / tmp </ value>

    </property>
  </configuration>

 

4. Modify hdfs-site.xml configuration file :( HDFS number of copies of data blocks, the default value is 3; HTTP protocol is provided where the service URL Secondary NameNode)

we /hdfs-site.xml

  <configuration>

    <!--指定HDFS副本的数量-->
    <property>
      <name>dfs.replication</name>
      <value>3</value>
    </property>

    <-! IP and port of the host where the secondary namenode ->
    <Property>
      <name> dfs.namenode.secondary.http-address </ name>
      <value> hadoop02: 50090 </ value>
    </ Property>
  </ configuration>

 

5. Modify mapred-site.xml file :( MapReduce Hadoop specified operating framework for YARN)

cp mapred-site.xml.template mapred-site.xml

we /mapred-site.xml

  <configuration>

    <! - Specifies MapReduce runtime framework specified herein on YARN, default local ->
    <Property>
      <name> mapreduce.framework.name </ name>
      <value> Yarn </ value>
    </ Property>
  < / configuration>

 

6. Modify the yarn-site.xml configuration file :( YARN ResourceManager master process running host hadoop01; ancillary services to configure NodeManager run time, you need to configure the default program mapreduce_shuffle to run MapReduce normal)

we /yarn-site.xml

  <configuration>

    <-! YARN managers designated cluster (ResourceManager) address ->

    <property>

      <name>yarn.resourcemanager.hostname</name>
      <value>hadoop01</value>
    </property>
    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>

  </configuration>

 

7. Modify the slaves file :( records from all Hadoop cluster host name of the node, with a key to start; need to delete the default content)

we / slave

  hadoop01
  hadoop02
  hadoop03

 

8. The master node cluster configuration file distributed to other child nodes:

scp /etc/profile hadoop02:/etc/profile

scp /etc/profile hadoop03:/etc/profile

scp -r /export/ hadoop02:/

scp -r /export/ hadoop03:/

 

9. The child nodes are the refresh instruction:

source /etc/profile

 

 

Three, Hadoop cluster test

1. Format File System:

hdfs namenode -format

OR

Hadoop namenode -format

 

2. startup and shutdown Hadoop cluster: one by one single node startup and shutdown

(1) the primary node startup and shutdown process HDFS NameNode

hadoop-daemon.sh start namenode

hadoop-daemon.sh stop namenode

(2) startup and shutdown process from node HDFS DataNode

hadoop-daemon.sh start datanode

hadoop-daemon.sh stop datanode

 (3) the primary node startup and shutdown process YARN ResourceManager

yarn-deamon.sh start resourcemanager

yarn-deamon.sh stop resourcemanager

(4) startup and shutdown process from node YARN NodeManager

yarn-deamon.sh start nodemanager

yarn-deamon.sh stop nodemanager

(5) planning hadoop02 node startup and shutdown process SecondaryNameNode

hadoop-daemon.sh start secondarynamenode

hadoop-daemon.sh stop secondarynamenode

 

3. Startup and Shutdown Hadoop cluster: a key to start the startup and shutdown

(1) the primary node startup and shutdown processes all HDFS service

start-dfs.sh

stop-dfs.sh

(2) the primary node startup and shutdown of all service processes YARN

start-yarn.sh

stop-yarn.sh

Guess you like

Origin www.cnblogs.com/-StarrySky-/p/11904648.html