linux-hadoop集群搭建

A、系统:

centos7.2

hadoop-2.6.0-cdh5.15.1

http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.15.1.tar.gz

B、角色分配(修改/etc/hostname,/etc/hosts):

192.168.2.199    bigdata0000.tfpay.com    bigdata000

192.168.2.201    bigdata01.tfpay.com    bigdata01

192.168.2.202    bigdata02.tfpay.com    bigdata02

bigdata000    NameNode   DataNode    ResourceManager    Master    

bigdata01    DataNode    NodeManageer

bigdata02    DataNode    NodeManageer

所需文件:

CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel

CDH-5.15.1-1.cdh5.15.1.p0.4-el7.parcel.sha1

cloudera-manager-centos7-cm5.15.1_x86_64.tar.gz

creat_sh.sh

hadoop-2.6.0-cdh5.15.1.tar.gz

hadoop-native-64-2.6.0.tar

jdk-8u191-linux-x64.rpm

manifest.json

MySQL-5.6.26-1.linux_glibc2.5.x86_64.rpm-bundle.tar

mysql-connector-java-5.1.47-bin.jar

mysql-connector-java-5.1.47.zip

mysql-connector-java-6.0.2.jar

setup.sh

C、环境搭建

一、ssh配置免密码

ssh-keygen -t rsa(所有节点)

ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata01

ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata02

ssh-copy-id -i ~/.ssh/id_rsa.pub bigdata000

验证:

ssh bigdata000

ssh bigdata01

ssh bigdata02

二、JDK安装:

rpm -ivh --prefix=/app/  ./jdk-8u191-linux-x64.rpm

配置环境变量:

在~/.bash_profile写入

export JAVA_HOME=/app/jdk1.8.0_191-amd64/

export PATH=$PATH:$JAVA_HOME/bin

生效环境变量:

source ~/.bash_profile

验证:

java -version

java version "1.8.0_191"

Java(TM) SE Runtime Environment (build 1.8.0_191-b12)

Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

javac -version

javac 1.8.0_191

三、集群搭建

解压hadoop

mkdir /app

chmod 777 /app

tar -zxvf ./hadoop-2.6.0-cdh5.15.1.tar.gz -C /app/

tar -xvf /mnt/bi/hadoop-native-64-2.6.0.tar -C /app/hadoop-2.6.0-cdh5.15.1/lib/native/

配置环境变量:

在~/.bash_profile写入

export HADOOP_HOME=/app/hadoop-2.6.0-cdh5.15.1

export PATH=$PATH:$HADOOP_HOME/bin

生效环境变量:

source ~/.bash_profile

验证:

hadoop

Usage: hadoop [--config confdir] COMMAND

       where COMMAND is one of:

  fs                   run a generic filesystem user client

.....................

配置hadoop-env.sh和core-site.xml

etc/hadoop/hadoop-env.sh

写入

export JAVA_HOME=/app/jdk1.8.0_191-amd64/

etc/hadoop/core-site.xml:

<configuration>

        <property>

                <name>fs.defaultFS</name>

        <!--<name>fs.default.name</name>-->

                <!--不能加:hdfs://bigdata000:8020,不知道为何-->

                <value>hdfs://bigdata000</value>

        </property>

    

    <!--添加一个临时文件,重启时候不会删除-->

    <property>

        <name>hadoop.tmp.dir</name>

        <value>/app/hadoop-2.6.0-cdh5.15.1/tmp</value>

    </property>

</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>

<!--

副本系数默认为三个,伪分布式环境下一般不用修改

-->

<!--

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

-->

<!--

配置namenode路径

-->

    <property>

        <name>dfs.namenode.name.dir</name>

        <value>/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/name</value>

    </property>

<!--

配置datanode路径

-->

    <property>

        <name>dfs.datanode.data.dir</name>

        <value>/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data</value>

    </property>

</configuration>

etc/hadoop/yarn-site.xml:

<!--

使用mapreduce

-->

<configuration>

<!-- Site specific YARN configuration properties -->

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

    <property>

        <name>yarn.resourcemanager.hostname</name>

        <value>bigdata000</value>

    </property>

</configuration>

etc/hadoop/mapred-site.xml(需要从etc/mapred-site.xml.template复制):

<configuration>

<!--

mapreduce使用的框架

-->

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

etc/hadoop/slaves写入slave node的host

bigdata000

bigdata01

bigdata02

四、master分配到slaves

scp -r /app root@bigdata02:/

scp -r /root/.bash_profile root@bigdata02:/root

scp -r /app root@bigdata01:/

scp -r /root/.bash_profile root@bigdata01:/root

五、格式化NameNode

hdfs namenode -format

六、开启关闭

/app/hadoop-2.6.0-cdh5.15.1/sbin/stop-all.sh

/app/hadoop-2.6.0-cdh5.15.1/sbin/start-all.sh

使用jps验证:

bigdata000:

3620 ResourceManager

3717 NodeManager

5461 Jps

3450 SecondaryNameNode

3197 NameNode

3294 DataNode

bigdata01:

1923 NodeManager

1819 DataNode

2253 Jps

bigdata02:

1639 DataNode

2071 Jps

1743 NodeManager

使用Web Interfaces验证:

http://192.168.2.199:50070

http://192.168.2.199:8088

使用命令行验证:

[root@bigdata000 ~]# hadoop fs -put /tmp/yarn-root-nodemanager.pid

[root@bigdata000 ~]# hadoop fs -ls /

Found 1 items

-rw-r--r--   3 root supergroup          5 2018-11-17 01:56 /yarn-root-nodemanager.pid

七、使用hadoop集群

八、异常

多次执行格式化时会出错(hdfs namenode -format):

a、datanode无法启动成功

重新格式化后NameNode的clusterID变化,与DataNode中的不一致

修改方法:从/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/name/current/VERSION中获取clusterID=CID-c043fc46-adf6-4ad9-ab73-a66e75e32567,将其修改至每一个/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data/current/VERSION中,重启集群

b、web ui中只显示一个datanode

slaves从master scp时,/app/hadoop-2.6.0-cdh5.15.1/tmp/dfs/data/current/VERSION也拷贝过去了,每个DataNode的storageID都一致,只能显示一个,需修改每个storageID,重启集群

猜你喜欢

转载自blog.csdn.net/u013077314/article/details/84983912