大数据入门教程系列之Hadoop环境搭建--Hadoop集群/分布式搭建配置

本篇文章主要介绍在windows下使用vmware虚拟机搭建配置Hadoop集群/分布式。

简要步骤：

①、克隆1台机器(可以直接克隆Master)

②、设置静态IP、主机名、IP映射

③、配置ssh免密

④、修改配置文件

⑤、启动集群分布式

详细步骤：

一、克隆机器

选择虚拟机-管理-克隆

扫描二维码关注公众号，回复： 3608387 查看本文章

二、设置静态IP、设置主机名及IP映射

如果你不知道网卡文件名，可以输入如下命令查看
ll /etc/sysconfig/network-scripts/ | grep ifcfg-en
IP的设置是在网卡文件中，使用vi编辑器打开网卡文件
vi /etc/sysconfig/network-scripts/ifcfg-ens33
网卡文件修改后，需要重启网络服务，输入如下命令
systemctl restart network

三、配置ssh免密登陆

ssh [email protected]

查看是否安装了SSH

集群、单节点模式都需要用到 SSH 登陆（类似于远程登陆，你可以登录某台 Linux 主机，并且在上面运行命令），一般情况下，CentOS 默认已安装了 SSH client、SSH server，执行如下命令进行检验：使用hadoop用户登录

rpm -qa | grep ssh

可以看到已经安装了ssh client和ssh server，则无需在安装，如果没有安装，可以使用如下命令安装即可，其它依赖包yum会自己检测安装

sudo yum install openssh-clients
sudo yum install openssh-server

配置SSH无密码登录

目的：在Master中可以使用ssh Slave1直接登录到从机Slave1，无需输入密码

操作步骤：

①.在主机Master中生成密钥

cd ~/.ssh/  # 切换到hadoop用户的主题目录下的.ssh文件，如果没有.ssh文件请先执行一次ssh localhost
ssh-keygen -t rsa    # 会有提示，都按回车就可以

②.将Master中生成的密钥加入授权（authorized_keys）

cat id_rsa.pub                       # 查看生成的公钥
cat id_rsa.pub >> authorized_keys    # 加入授权
chmod 600 authorized_keys    # 修改文件权限，如果不修改文件权限，那么其它用户就能查看该授权文件，然后使用该密钥也能进行登录，不安全

③.在Master中将授权文件远程拷贝到从机Slave1的hadoop用户主题目录下的.ssh目录下（/home/hadoop/.ssh/）

使用hadoop用户登录从机Slave1，未远程拷贝前

在Master主机中进行远程拷贝

scp authorized_keys hadoop@Slavel:~/.ssh/

在Slave1从机中的/home/hadoop/.ssh/目录下查看是否有authorized_keys文件

测试：在主机Master中使用ssh Slave1登录从机Slave1

ssh Slave1

未配置前没有authorized_keys文件

配置后有authorized_keys文件

可以看到，无需输入密码就能登录成功

四、修改配置文件（core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml、slaves）

①core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <!-- Hadoop 文件系统的临时目录（NameNode和DataNode默认存放在hadoop.tmp.dir目录）-->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/usr/local/hadoop/tmp</value>
    </property>
 
    <!-- 配置NameNode的URI -->
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://Master:9000</value>
    </property>


</configuration>

②hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<!-- Master可替换为IP -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>Master:50070</value>
    </property>
    <!-- 设置系统里面的文件块的数据备份个数，默认是3 -->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!-- NameNode结点存储hadoop文件系统信息的本地系统路径 -->
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/name</value>
    </property>
    <!-- DataNode结点被指定要存储数据的本地文件系统路径，这个值只对NameNode有效，DataNode并不需要使用到它 -->
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop/tmp/dfs/data</value>
    </property>

</configuration>

③yarn-site.xml

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
 <!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.234.136</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

</configuration>

④mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
 
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
            <description>执行框架设置为Hadoop YARN</description>
    </property>
    <property>
            <name>mapreduce.jobhistory.address</name>
            <value>Master:10020</value>
            <description>Master可替换为IP</description>
    </property>
    <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>Master:19888</value>
            <description>Master可替换为IP</description>
    </property>

</configuration>

⑤slaves

Slavel

五、启动集群/分布式

start-all.sh
stop-all.sh
可以看到Master这个机器现在是作为NameNode

可以看到slavel这个机器现在是作为DataNode

之前单机的时候Master既当主也当从

现在是一主一从了

好了，到这里hadoop到集群分布式安装配置就结束，童鞋们在学习过程中有问题或者发现笔者到有错误，评论即可。