hadoop-2.6.5集群搭建

1.修改主机名:vi /etc/sysconfig/network

NETWORKING=yes
HOSTNAME=node1

2.修改域名映射:vi /etc/hosts

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 //有没有都可以
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 //有没有都可以
192.168.10.11 node1
192.168.10.12 node2
192.168.10.13 node3
192.168.10.14 node4

3.设置date同步:

1)yum install ntp //如果服务器中没有则安装
  1.1)chkconfig ntpd on //设置开机自启
2) ntpdate ntp.api.bz //时间服务器
3)service ntpd start/stop/restart/reload
4)设置定时同步:crontab -e
  */10 * * * * ntpdate time.nist.gov //每隔10分钟同步一次
  4.1)我们可以通过 chkconfig --list | grep cron 命令来查看cron服务的启动情况
    crond 0:关闭 1:关闭 2:启用 3:启用 4:启用 5:启用 6:关闭
    系统启动级别如果是1-4,cron服务都会开机自动启动的
  4.2)设置crond开机自启:chkconfig crond on
  4.3) crontab使用参数
    -e [UserName]: 执行文字编辑器来设定时程表,内定的文字编辑器是 vi
    -r [UserName]: 删除目前的时程表
    -l [UserName]: 列出目前的时程表
    -v [UserName]:列出用户cron作业的状态

4.关闭防火墙:chkconfig iptables off

5.关闭安全机制:vi /etc/selinux/config

SELINUX=disabled
SELINUXTYPE=targeted

6.ssh免密登录

1)yum list | grep ssh
2)yum install -y openssh-server openssh-clients
3)service sshd start
4)chkconfig sshd on
5)ssh-keygen // 生成秘钥
6) ssh-copy-id node1 // 免密登录 当前服务器可以免密登录node1
设置namenode和resourcemanager服务器免密登录所有服务器(namenode + datanode)

7.Hadoop完全分布式集群搭建:

1)配置文件
  1.1 vi + /etc/profile
    #JAVA_HOME
    export JAVA_HOME=/opt/module/jdk1.8.0_171
    #HADOOP_HOME
    export HADOOP_HOME=/opt/module/hadoop-2.6.5
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
    export JAVA_HOME=/opt/module/jdk1.8.0_171
  1.3 hdfs-core.xml
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://node1:8020</value>
    </property>
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/data/hadoop</value>
    </property>
  1.4 hdfs-site.xml
    <property>
      <name>dfs.replication</name>
      <value>2</value>
    </property>
    <property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>node2:50090</value>
    </property>
  1.5 slaves
    node2
    node3
    node4
  1.6 格式化文件系统:./bin/hdfs namenode -format
    查看帮助:./bin/hdfs namenode -h
  1.7 启动集群:./sbin/start-dfs.sh
  1.8 查看web UI: IP:50070:
    node1:50070
  1.9 帮助:
    hdfs
    hdfs dfs

    创建目录:hdfs dfs -mkdir -p /user/root
    查看目录: hdfs dfs -ls /
    上传文件: hdfs dfs -put hadoop-2.6.5.tar.gz /user/root
  1.10 停止集群:./sbin/stop-dfs.sh

8.Hadoop-HA搭建

  1)配置文件
    1.1 vi + /etc/profile
      #JAVA_HOME
      export JAVA_HOME=/opt/module/jdk1.8.0_171
      #HADOOP_HOME
      export HADOOP_HOME=/opt/module/hadoop-2.6.5
      #ZOOKEEPER_HOME
      export ZOOKEEPER_HOME=/opt/module/zookeeper-3.4.6
      export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
    1.2 hadoop-env.sh mapred-env.sh yarn-env.sh
      export JAVA_HOME=/opt/module/jdk1.8.0_171
    1.3 hdfs-core.xml
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
      </property>
      <property>
        <name>hadoop.tmp.dir</name>
      <value>/opt/data/hadoop</value>
      </property>
      <property>
        <name>ha.zookeeper.quorum</name>
        <value>node2:2181,node3:2181,node4:2181</value>
      </property>
    1.4 hdfs-site.xml
      <property>
        <name>dfs.replication</name>
        <value>2</value>
      </property>
      <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>node1:8020</value>
      </property>
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>node2:8020</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>node1:50070</value>
      </property>
      <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>node2:50070</value>
      </property>
      <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
      </property>
      <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
      </property>
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <!-- 如果文件是id_dsa这后边需要改成id_dsa -->
        <value>/root/.ssh/id_rsa</value>
      </property>
      <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/data/hadoop/journal</value>
      </property>
      <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
    1.5 slaves
      node2
      node3
      node4
    1.6 zookeeper集群搭建
      zoo.cfg
      tickTime=2000
      dataDir=/opt/data/zookeeper
      clientPort=2181
      initLimit=5
      syncLimit=2
      server.1=node2:2888:3888
      server.2=node3:2888:3888
      server.3=node4:2888:3888
      /opt/data/zookeeper/myid 内容分别是[1,2,3]
    1.7 每个zk节点上都执行:zkServer.sh start
      看是否启动成功:zkServer.sh status
    1.8 每个journalnode节点都执行:hadoop-daemon.sh start journalnode //必须在启动Hadoop集群之前先启动journalnode
    1.9 同步编辑日志
      如果已有集群并且是单namenode
        hdfs namenode -initializeSharedEdits(在已经format的namenode上执行)
        hadoop-daemon.sh start namenode
        hdfs namenode -bootstrapStandby(没有format的namenode上执行)
      如果是新建集群
        hdfs namenode -format
        hadoop-daemon.sh start namenode
        hdfs namenode -bootstrapStandby(没有format的namenode上执行)
    1.10 格式化zookeeper并启动
      hdfs zkfc -formatZK(在其中一台namenode节点上格式化即可)
      hadoop-daemon.sh start zkfc(两台zkfc(也就是namenode)节点都启动)或者直接全部启动start-dfs.sh

9.yarn搭建

1)配置文件
  mapred-site.xml
    <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
    </property>
  yarn-site.xml
    <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
    </property>
    <property>
      <name>yarn.resourcemanager.ha.enabled</name>
      <value>true</value>
    </property>
    <property>
      <name>yarn.resourcemanager.cluster-id</name>
      <value>cluster1</value>
    </property>
    <property>
      <name>yarn.resourcemanager.ha.rm-ids</name>
      <value>rm1,rm2</value>
    </property>
    <property>
      <name>yarn.resourcemanager.hostname.rm1</name>
      <value>node3</value>
    </property>
    <property>
      <name>yarn.resourcemanager.hostname.rm2</name>
      <value>node4</value>
    </property>
    <property>
      <name>yarn.resourcemanager.zk-address</name>
      <value>node2:2181,node3:2181,node4:2181</value>
    </property>
2)启动
  start-yarn.sh (这个只启动nodemanager)
  yarn-daemon.sh start resourcemanager (在两台resourcemanager节点上都启动)

3)测试wordcount
  hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /user/jqbai/test.txt /user/jqbai/wordcount

10.搭建 windows开发环境

添加环境变量:
  1)HADOOP_USER_NAME=root
  2)HADOOP_HOME=D:\software\hadoop-2.6.5(这是Windows下专用的)

扫描二维码关注公众号,回复: 6374348 查看本文章

猜你喜欢

转载自www.cnblogs.com/jqbai/p/10989925.html