Hadoop搭建本地模式,伪分布模式,分布式集群安装(详解)

一 前期准备

  1. 安装虚拟机 Linux安装(略)
  2. 配置网络地址(NAT)(略)

 [root@hadoopNode1 ~]# vi  /etc/sysconfig/network-scripts/ifcfg-ens33

​ 注:确定IP地址,网段

  1. 修改hostname配置(略)
    
 [root@hadoopNode1 ~]# vi     /etc/hostname

​ 注:主算机名使用英文字母组成, 设置好后就不能随意的修改

  1. 修改hosts映射配置()
    

例如:192.168.138.100 HadoopNode1

  1. 关闭防火墙(略)

    关闭:

    开机禁用:

    [root@hadoopNode1 ~]#systemctl stop firewalld   
    
    [root@hadoopNode1 ~]#systemctl disable firewalld
    
    

    重起操作系统生效

  2. 创建用户ambow,并创建密码ambow(略)
    
 [root@hadoopNode1 ~]#     useradd ambow
 
 [root@hadoopNode1 ~]#    passwd  ambow   
 
  1. 设置ambow用户具有root权限 sudo

使用root用户,修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行,如下所示:

[ambow@ master soft]# vi /etc/sudoers

## Allow root to run any commands anywhere 

root ALL=(ALL) ALL 

ambow ALL=(ALL) ALL 

修改完毕,现在可以用ambow帐号登录,然后用命令 su - ambow,即可获得root权限进行操作。

  1. 安装JDK

    tar包:
    [ambow@hadoopNode1 ~]$ pwd
    /home/ambow
    [ambow@hadoopNode1 ~]$ mkdir soft
    [ambow@hadoopNode1 ~]$ mkdir app
    [ambow@hadoopNode1 ~]$ ls
    app  soft
    [ambow@hadoopNode1 ~]$ tree .
    .
    ├── app
    └── soft
        ├── hadoop-2.7.3.tar.gz
        ├── jdk-8u121-linux-x64.tar.gz
        └── zookeeper-3.4.6.tar.gz
    
    2 directories, 3 files
    [ambow@hadoopNode1 ~]$ pwd
    /home/ambow
    [ambow@hadoopNode1 ~]$ tar -zxvf ./soft/jdk-8u121-linux-x64.tar.gz  -C  ./app/
    
    

​ 配置JDK:

[ambow@hadoopNode1 jdk1.8.0_121]$ vi  ~/.bash_profile
[ambow@hadoopNode1 jdk1.8.0_121]$ cat ~/.bash_profile
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

JAVA_HOME=/home/ambow/app/jdk1.8.0_121

PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin

export PATH
export JAVA_HOME

[ambow@hadoopNode1 jdk1.8.0_121]$
[ambow@hadoopNode1 jdk1.8.0_121]$ source ~/.bash_profile


source ~/.bash_profile 让配置文件生效

  1. 重启操作系统
    reboot
    Hadoop 三种模式安装

    1.本地模式 用于开发和调式

    2.伪分布式 模拟一个小规模的集群
    一台主机模拟多主机

    启动 NameNode DataNode ResouceManger,nodeManager
    3.集群模式:(生产环境)

    多台主机,分别充当NaameNode,DataNode 。。。。
    Hadoop本地模式安装:

Hadoop本地模式安装

  1. 解压Hadoop软件

    [ambow@hadoopNode1 sbin]$ tar -zxvf   ~/soft/hadoop-2.7.3.tar.gz  -C  ~/app/
    
    

  2. 配置Hadoop环境变量

  ```shell
  [ambow@hadoopNode1 hadoop-2.7.3]$ vi ~/.bash_profile
  [ambow@hadoopNode1 hadoop-2.7.3]$ cat  ~/.bash_profile
  # .bash_profile

  # Get the aliases and functions
  if [ -f ~/.bashrc ]; then
          . ~/.bashrc
  fi

  # User specific environment and startup programs

  JAVA_HOME=/home/ambow/app/jdk1.8.0_121

  HADOOP_HOME=/home/ambow/app/hadoop-2.7.3

  PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

  export PATH
  export JAVA_HOME
  export HADOOP_HOME

  ```

  ​
  1. 环境变量生效

    [ambow@hadoopNode1 hadoop-2.7.3]$ source ~/.bash_profile  
    

  2. 测试

​ 新建测试的数据文件: ~/data/mydata.txt

测试语法格式:

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 类名 输入 输出目录

[ambow@hadoopNode1 mydata.out]$ hadoop jar ~/app/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount  ~/data/mydata.txt   ~/data/mydata.out2

伪分布模式配置:

  1. JDK安装

  2. Hadoop安装

  3. 配置Hadoop的$HADOOP_HOME/etc/hadoop/core-siter.xml
    fs.defaultFS
    hadoop.tmp.dir

    [ambow@hadoopNode1 hadoop]$ vi   $HADOOP_HOME/etc/hadoop/core-site.xml
    

    <configuration>
        <!--   配置默认FS  hadoop3.X 默认端口为9820   hadoop2.X  默认端口为8020  hadoop1.X  默认端口为9000    一般伪分布设置为localhost:8020 -->
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://localhost:8020</value>
        </property>
    
      <!-- 指定hadoop运行时产生文件存储的目录   会自动创建  不建议默认 -->
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/ambow/hdfs/data</value>
        </property>
    
    </configuration>
    
    

  4. 配置 hdfs-siter.xml

    dfs.replication 设置块的副本个数: 伪分布模式只能设置为1 默认为3

    [ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
    

    <configuration>
            
         <property>
              <!--  配置每个block的副本个数 默认3个  当是单节点时配置为1    不能配置态多,态多反而降低效果 -->
             <name>dfs.replication</name>
              <!--  伪分布式只能配1个副本 -->
             <value>1</value>
         </property>
    
    </configuration>
    
  5. 格式化

[ambow@hadoopNode1 ~]$ hadoop namenode -format

一般只格式化一次,如果要再格式化,建议要把各dataNode节点的数据删除要,防止DataNode和NameNode的集群ID号不一致而无法启动

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pto6Dc2v-1575559953078)(.\hadoop_imag\1565854449952.png)]

6.启动
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode

hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop datanode

7.查看进程
jps

  1. logs日志文件
    ~/soft/hadop/logs

9:WEB访问查看
http://192.168.100.100:50070/

  1. YARN上运行MR 要配置两个配置文件

    配置mapred-siter.xml

    [ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
    
    <configuration>
         <property>
             <!--  指定MapReduce使用Yarn资源管理框架  -->
             <name>mapreduce.framework.name</name>
             <value>yarn</value>
         </property>
    </configuration>
    

  2. 配置yarn-siter.xml

​ yarn.resourcemanger.hostname
yarn.nodemaager.aux-service

<configuration>
     <property>
         <!--  指定yaran主要管理一个机节点   主机名 -->
         <name>yarn.resourcemanager.hostname</name>
         <value>hadoopNode1</value>
     </property>
     <property>
          <!-- 使用mapreduce_shuffle服务    -->
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>
    
</configuration>
  1. 启动 yarn
[ambow@hadoopNode1 data]$ yarn-daemon.sh start resourcemanager

[ambow@hadoopNode1 data]$ yarn-daemon.sh start nodemanager

  1. 测试MR操作

上传Linux系统中的 ~/data/mydata.txt文件 至 HDFS文件系统/user/ambow目录中去

[ambow@hadoopNode1 data]$ hadoop dfs -put ~/data/mydata.txt   /user/ambow

对hdfs的文件使用yan来进行wordcount操作:


 hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar  wordcount  /user/ambow/mydata.txt   /user/ambow/output/wc/

分布式集群安装
1.修/etc/hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.200  hadoopNode1
192.168.100.201  hadoopNode2
192.168.100.202  hadoopNode3
192.168.100.203  hadoopNode4

2.以伪分布模来 克隆二个虚拟机

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zh4Yl3nz-1575560034539)(C:\Users\LDG\AppData\Local\Temp\1565926490995.png)]

3.分别配置各虚拟机节点的:IP地址,主机名,映射文件

[root@hadoopNode2 ~]# vi /etc/hostname
[root@hadoopNode2 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoopNode2 ~]# vi /etc/hosts

4.验证配置

[root@hadoopNode2 ~]# ping hadoopNode1
PING hadoopNode1 (192.168.100.200) 56(84) bytes of data.
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=1 ttl=64 time=0.190 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=2 ttl=64 time=0.230 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=3 ttl=64 time=0.263 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=4 ttl=64 time=0.227 ms
^C64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=5 ttl=64 time=0.195 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=6 ttl=64 time=0.268 ms
^C
--- hadoopNode1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.190/0.228/0.268/0.035 ms
[root@hadoopNode2 ~]# ping hadoopNode2
PING hadoopNode2 (192.168.100.201) 56(84) bytes of data.
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=1 ttl=64 time=0.011 ms
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=2 ttl=64 time=0.022 ms
^C
--- hadoopNode2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.011/0.016/0.022/0.006 ms
[root@hadoopNode2 ~]# ping hadoopNode3
PING hadoopNode3 (192.168.100.202) 56(84) bytes of data.
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=2 ttl=64 time=0.218 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=3 ttl=64 time=0.218 ms
^C64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=4 ttl=64 time=0.227 ms
^C
--- hadoopNode3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.218/0.227/0.246/0.015 ms

Master—slave主从架构

  1. 在master节点上进设置免密登陆

​ 1).生成Maste节点公钥和私钥

ssh-keygen -t rsa

2)分发

  ssh-copy-id  localhost

  ssh-copy-id   hadoopNOde1

  ssh-copy-id  hadoopNOde2

  ssh-copy-id  hadoopNOde3

3)验证 在Master节点上登陆各节点,看是否需要密码

ssh  hadoopNode2
ssh   hadoopNode3

6.配置核心文件 core-siter.xml

[ambow@hadoopNode1 hadoop]$ vi core-site.xml
<configuration>
    <!--   配置默认FS  hadoop3.X 默认端口为9820   hadoop2.X  默认端口为8020  hadoop1.X  默认端口为9000    一般伪分布设置为localhost:8020 -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoopNode1:8020</value>
    </property>

  <!-- 指定hadoop运行时产生文件存储的目录   会自动创建  不建议默认 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/ambow/hdfs/data</value>
    </property>

</configuration>

  1. hdfs-site.xml

<configuration>

     <property>
          <!--  配置每个block的副本个数 默认3个  当是单节点时配置为1    不能配置态多,态多反而降低效果 -->
         <name>dfs.replication</name>
          <!--  伪分布式只能配1个副本 -->
         <value>3</value>
     </property>


         <property>
                <!--     设置第辅助主节点  2NN -->
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoopNode2:50090</value>
        </property>

        <property>
                <!--     检查点的路径  -->
                <name>dfs.namenode.checkpoint.dir</name>
                <value>/home/ambow/hdfs/namesecondary</value>
        </property>


</configuration>

8.mapred-site.xml

<configuration>
     <property>
         <!--  指定MapReduce使用Yarn资源管理框架  -->
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
     </property>
</configuration>

  1. yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->

     <property>
         <!--  指定yaran主要管理一个机节点  -->
         <name>yarn.resourcemanager.hostname</name>
         <value>hadoopNode1</value>
     </property>

     <property>
          <!-- 使用mapreduce_shuffle服务    -->
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
     </property>

</configuration>

10.修改slaves文件 来指定当前集群中那些节点是DataNode节点 把节点的主机名添加到slaves文件中

[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/slaves


hadoopNode1
hadoopNode2
hadoopNode3

11.分发文件到其他节点

注:分发之前要停止所有服务

 网络复制:语法: scp  -r 源文件目录    用户名@主机名:目标路径
  -r 递归复制
[ambow@hadoopNode1 hadoop]$ scp  -r  $HADOOP_HOME/etc/hadoop   ambow@hadoopNode2:$HADOOP_HOME/etc/

[ambow@hadoopNode1 hadoop]$ scp  -r  $HADOOP_HOME/etc/hadoop   ambow@hadoopNode3:$HADOOP_HOME/etc/


注意:分发结束前一定要停止服务
分发完后要格式化

12.测试

start-all.sh(启动)

stop-all.sh(停止)

发布了133 篇原创文章 · 获赞 53 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/weixin_43599377/article/details/103414305
今日推荐