Hadoop up a local mode, pseudo-distribution pattern, distributed cluster installation (Comments)

A preparatory

  1. Install the virtual machine to install Linux (slightly)
  2. Configure the network address (NAT) (omitted)

 [root@hadoopNode1 ~]# vi  /etc/sysconfig/network-scripts/ifcfg-ens33

NOTE: To determine the IP address, network segment

  1. 修改hostname配置(略)
    
 [root@hadoopNode1 ~]# vi     /etc/hostname

Note: The main computer name in English letters and can not modify the settings after a good

  1. 修改hosts映射配置()
    

For example: 192.168.138.100 HadoopNode1

  1. Turn off the firewall (slightly)

    shut down:

    Disable boot:

    [root@hadoopNode1 ~]#systemctl stop firewalld   
    
    [root@hadoopNode1 ~]#systemctl disable firewalld
    
    

    Re-starting the operating system to take effect

  2. 创建用户ambow,并创建密码ambow(略)
    
 [root@hadoopNode1 ~]#     useradd ambow
 
 [root@hadoopNode1 ~]#    passwd  ambow   
 
  1. Set ambow user with root privileges sudo

Using the root user, modify the / etc / sudoers file and find the line, adding a line below the root, as follows:

[ambow@ master soft]# vi /etc/sudoers

## Allow root to run any commands anywhere 

root ALL=(ALL) ALL 

ambow ALL=(ALL) ALL 

Modification, can now be ambow account login, and then use the command su - ambow, you can get root privileges to operate.

  1. Install the JDK

    tar package:
    [ambow@hadoopNode1 ~]$ pwd
    /home/ambow
    [ambow@hadoopNode1 ~]$ mkdir soft
    [ambow@hadoopNode1 ~]$ mkdir app
    [ambow@hadoopNode1 ~]$ ls
    app  soft
    [ambow@hadoopNode1 ~]$ tree .
    .
    ├── app
    └── soft
        ├── hadoop-2.7.3.tar.gz
        ├── jdk-8u121-linux-x64.tar.gz
        └── zookeeper-3.4.6.tar.gz
    
    2 directories, 3 files
    [ambow@hadoopNode1 ~]$ pwd
    /home/ambow
    [ambow@hadoopNode1 ~]$ tar -zxvf ./soft/jdk-8u121-linux-x64.tar.gz  -C  ./app/
    
    

Configuration JDK:

[ambow@hadoopNode1 jdk1.8.0_121]$ vi  ~/.bash_profile
[ambow@hadoopNode1 jdk1.8.0_121]$ cat ~/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

JAVA_HOME=/home/ambow/app/jdk1.8.0_121

PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin

export PATH
export JAVA_HOME

[ambow@hadoopNode1 jdk1.8.0_121]$
[ambow@hadoopNode1 jdk1.8.0_121]$ source ~/.bash_profile


source ~ / .bash_profile make the configuration files to take effect

  1. Restart the OS
    reboot
    Hadoop installation of three models

    1. Local Mode for development and debugging of formula

    2. pseudo-distributed simulation of a small cluster
    of a host multi-host simulation

    DataNode ResouceManger start the NameNode, nodemanager
    3. Cluster mode :( production environment)

    Multiple hosts, respectively act as NaameNode, DataNode. . . .
    Native Hadoop installation mode:

Hadoop Native Mode Installation

  1. Decompression software Hadoop

    [ambow@hadoopNode1 sbin]$ tar -zxvf   ~/soft/hadoop-2.7.3.tar.gz  -C  ~/app/
    
    

  2. Hadoop configuration environment variable

  ```shell
  [ambow@hadoopNode1 hadoop-2.7.3]$ vi ~/.bash_profile
  [ambow@hadoopNode1 hadoop-2.7.3]$ cat  ~/.bash_profile
  # .bash_profile

  # Get the aliases and functions
  if [ -f ~/.bashrc ]; then
          . ~/.bashrc
  fi

  # User specific environment and startup programs

  JAVA_HOME=/home/ambow/app/jdk1.8.0_121

  HADOOP_HOME=/home/ambow/app/hadoop-2.7.3

  PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  export PATH
  export JAVA_HOME
  export HADOOP_HOME

  ```

  ​
  1. Environment variables to take effect

    [ambow@hadoopNode1 hadoop-2.7.3]$ source ~/.bash_profile  
    

  2. test

New test data files: ~ / data / mydata.txt

Test syntax:

hadoop jar $ HADOOP_HOME / share / hadoop / mapreduce / hadoop-mapreduce-examples-2.7.3.jar class name directory O

[ambow@hadoopNode1 mydata.out]$ hadoop jar ~/app/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount  ~/data/mydata.txt   ~/data/mydata.out2

Pseudo distribution pattern configuration:

  1. JDK installation

  2. Hadoop installation

  3. 配置Hadoop的$HADOOP_HOME/etc/hadoop/core-siter.xml
    fs.defaultFS
    hadoop.tmp.dir

    [ambow@hadoopNode1 hadoop]$ vi   $HADOOP_HOME/etc/hadoop/core-site.xml
    

    <configuration>
        <!--   配置默认FS  hadoop3.X 默认端口为9820   hadoop2.X  默认端口为8020  hadoop1.X  默认端口为9000    一般伪分布设置为localhost:8020 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:8020</value>
        </property>
    
      <!-- 指定hadoop运行时产生文件存储的目录   会自动创建  不建议默认 -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/ambow/hdfs/data</value>
        </property>
    
    </configuration>
    
    

  4. Configuring hdfs-siter.xml

    setting the number of block copy dfs.replication: pseudo-distribution pattern can only be set to a default 3

    [ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    

    <configuration>
            
         <property>
              <!--  配置每个block的副本个数 默认3个  当是单节点时配置为1    不能配置态多,态多反而降低效果 -->
             <name>dfs.replication</name>
              <!--  伪分布式只能配1个副本 -->
             <value>1</value>
         </property>
    
    </configuration>
    
  5. format

[ambow@hadoopNode1 ~]$ hadoop namenode -format

General format only once, if the format again, it is recommended each dataNode data should be deleted node, to prevent inconsistencies ID number DataNode and NameNode cluster can not start

[Picture dump outside the chain fails, the source station may have a security chain mechanism, it is recommended to save the pictures uploaded directly down (img-pto6Dc2v-1575559953078) (. \ Hadoop_imag \ 1565854449952.png)]

6. Start
hadoop-daemon.sh Start the NameNode
hadoop-daemon.sh Start Datanode

hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop datanode

7. Review process
jps

  1. logs log file
    ~ / soft / hadop / logs

9: WEB access to view
http://192.168.100.100:50070/

  1. To configure MR run two profiles on YARN

    Placed mapred-siter.xml

    [ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
    
    <configuration>
         <property>
             <!--  指定MapReduce使用Yarn资源管理框架  -->
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
    </configuration>
    

  2. Configuring yarn-siter.xml

​ yarn.resourcemanger.hostname
yarn.nodemaager.aux-service

<configuration>
     <property>
         <!--  指定yaran主要管理一个机节点   主机名 -->
         <name>yarn.resourcemanager.hostname</name>
         <value>hadoopNode1</value>
     </property>
     <property>
          <!-- 使用mapreduce_shuffle服务    -->
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
    
</configuration>
  1. Start yarn
[ambow@hadoopNode1 data]$ yarn-daemon.sh start resourcemanager

[ambow@hadoopNode1 data]$ yarn-daemon.sh start nodemanager

  1. MR test operation

Upload Linux system ~ / data / mydata.txt file to the HDFS file system / user / ambow directory to

[ambow@hadoopNode1 data]$ hadoop dfs -put ~/data/mydata.txt   /user/ambow

Hdfs files using yan wordcount to operate:


 hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar  wordcount  /user/ambow/mydata.txt   /user/ambow/output/wc/

Distributed Cluster Installation
1. Repair / etc / hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.200  hadoopNode1
192.168.100.201  hadoopNode2
192.168.100.202  hadoopNode3
192.168.100.203  hadoopNode4

2. Pseudo distribution mode VM to clone two

[Image dump the chain fails, the source station may have security chain mechanism, it is recommended to save the picture down uploaded directly (img-zh4Yl3nz-1575560034539) (C: \ Users \ LDG \ AppData \ Local \ Temp \ 1565926490995.png)]

3. Configure each virtual machine node: IP address, hostname, map files

[root@hadoopNode2 ~]# vi /etc/hostname
[root@hadoopNode2 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoopNode2 ~]# vi /etc/hosts

4. Verify the configuration

[root@hadoopNode2 ~]# ping hadoopNode1
PING hadoopNode1 (192.168.100.200) 56(84) bytes of data.
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=1 ttl=64 time=0.190 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=2 ttl=64 time=0.230 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=3 ttl=64 time=0.263 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=4 ttl=64 time=0.227 ms
^C64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=5 ttl=64 time=0.195 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=6 ttl=64 time=0.268 ms
^C
--- hadoopNode1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.190/0.228/0.268/0.035 ms
[root@hadoopNode2 ~]# ping hadoopNode2
PING hadoopNode2 (192.168.100.201) 56(84) bytes of data.
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=1 ttl=64 time=0.011 ms
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=2 ttl=64 time=0.022 ms
^C
--- hadoopNode2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.011/0.016/0.022/0.006 ms
[root@hadoopNode2 ~]# ping hadoopNode3
PING hadoopNode3 (192.168.100.202) 56(84) bytes of data.
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=2 ttl=64 time=0.218 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=3 ttl=64 time=0.218 ms
^C64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=4 ttl=64 time=0.227 ms
^C
--- hadoopNode3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.218/0.227/0.246/0.015 ms

Master-slave master-slave architecture

  1. In the master node set free motivated secret landing

1). Maste node generates public and private keys

ssh-keygen -t rsa

2) distribution

  ssh-copy-id  localhost

  ssh-copy-id   hadoopNOde1

  ssh-copy-id  hadoopNOde2

  ssh-copy-id  hadoopNOde3

3) Verify landed each node on the Master node to see if a password is required

ssh  hadoopNode2
ssh   hadoopNode3

6. Configure core file core-siter.xml

[ambow@hadoopNode1 hadoop]$ vi core-site.xml
<configuration>
    <!--   配置默认FS  hadoop3.X 默认端口为9820   hadoop2.X  默认端口为8020  hadoop1.X  默认端口为9000    一般伪分布设置为localhost:8020 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoopNode1:8020</value>
    </property>

  <!-- 指定hadoop运行时产生文件存储的目录   会自动创建  不建议默认 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/ambow/hdfs/data</value>
    </property>

</configuration>

  1. hdfs-site.xml

<configuration>

     <property>
          <!--  配置每个block的副本个数 默认3个  当是单节点时配置为1    不能配置态多,态多反而降低效果 -->
         <name>dfs.replication</name>
          <!--  伪分布式只能配1个副本 -->
         <value>3</value>
     </property>


         <property>
                <!--     设置第辅助主节点  2NN -->
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoopNode2:50090</value>
        </property>

        <property>
                <!--     检查点的路径  -->
                <name>dfs.namenode.checkpoint.dir</name>
                <value>/home/ambow/hdfs/namesecondary</value>
        </property>


</configuration>

8.mapred-site.xml

<configuration>
     <property>
         <!--  指定MapReduce使用Yarn资源管理框架  -->
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
     </property>
</configuration>

  1. yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->

     <property>
         <!--  指定yaran主要管理一个机节点  -->
         <name>yarn.resourcemanager.hostname</name>
         <value>hadoopNode1</value>
     </property>

     <property>
          <!-- 使用mapreduce_shuffle服务    -->
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>

</configuration>

10. Modify slaves file to specify the current cluster nodes are those nodes DataNode to add the host name of the node to the slaves file

[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/slaves


hadoopNode1
hadoopNode2
hadoopNode3

11. distribute files to other nodes

Note: To stop all services before distributing

 网络复制:语法: scp  -r 源文件目录    用户名@主机名:目标路径
  -r 递归复制
[ambow@hadoopNode1 hadoop]$ scp  -r  $HADOOP_HOME/etc/hadoop   ambow@hadoopNode2:$HADOOP_HOME/etc/

[ambow@hadoopNode1 hadoop]$ scp  -r  $HADOOP_HOME/etc/hadoop   ambow@hadoopNode3:$HADOOP_HOME/etc/


Note: Be sure to stop before the end of the distribution service
to distribute after use to format

12. Test

start-all.sh (start)

stop-all.sh (stop)

Published 133 original articles · won praise 53 · views 20000 +

Guess you like

Origin blog.csdn.net/weixin_43599377/article/details/103414305