Installation IP version of Hadoop 2.9 cluster

install hadoop

Install prerequisite libraries

$ sudo apt-get install ssh 
$ sudo apt-get install rsync
$ sudo apt-get install openjdk-8-jdk

install hadoop

$ wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.9.0/hadoop-2.9.0.tar.gz
$ tar -zxvf hadoop-2.9.0.tar.gz
$ sudo mv hadoop-2.9.0 /opt/hadoop
$ cd /opt/hadoop

Common configuration files

etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/yarn-site.xml
etc/hadoop/mapred-site.xml

Cluster settings

Suppose the master server address is (192.168.71.156) and the slave server address is (192.168.71.158).

# 每个主服务器应该新增如下数据(ubuntu是我的主机名)
""
127.0.0.1  192.168.71.156 localhost ubuntu
""

# 每个从服务器应该新增如下数据(ubuntu是我的主机名)
""
127.0.0.1  192.168.71.158 localhost ubuntu
""

Set up SSH communication

# 生成ssh公钥
> ssh-keygen -t rsa

# 把从服务器的公钥发到主服务器
> scp /home/hdgs/.ssh/id_rsa.pub hdgs@192.168.71.156:~/.ssh/id_rsa.pub.158

# 在主服务器上面设置authorize_key
> cat ~/.ssh/id_rsa.pub* >> ~/.ssh/authorized_keys

# 把主服务器的key发送到从服务器上
> scp /home/hdgs/.ssh/authorized_keys hdgs@192.168.71.158:~/.ssh/

# 测试ssh
> ssh 192.168.71.158

set JAVA_HOME

$ sudo vim etc/hadoop/hadoop-env.sh 
""
export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"
""

Placement core-site.xml

Modify the Hadoop core configuration file core-site.xml, where the address and port number of the main node namenode are configured

> vim /opt/hadoop/etc/hadoop/core-site.xml 

Add the following configuration:

<configuration>  
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://192.168.71.156:9000</value>
    </property>
</configuration>

configure hdfs-site.xml

# 创建数据文件夹
$ mkdir /opt/hadoop/namenode
$ mkdir /opt/hadoop/datanode

> vim /opt/hadoop/etc/hadoop/hdfs-site.xml

Add the following configuration:

<configuration>  
    <!-- 使用IP地址需要添加hostname不检查 -->
    <property>
        <name>dfs.namenode.datanode.registration.ip-hostname-check</name>
        <value>false</value>
    </property>

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/opt/hadoop/namenode</value>
    </property>

    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/opt/hadoop/datanode</value>
    </property>

     <!-- 设置namenode的http通讯地址 -->
    <property>
        <name>dfs.namenode.http-address</name>
        <value>192.168.71.156:50070</value>
    </property>

    <!-- 设置secondarynamenode的http通讯地址 -->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>192.168.71.158:50090</value>
    </property>

    <!-- 配置webhdfs中datanode的数据端口 -->
    <property>
        <name>dfs.datanode.http.address</name>
        <value>0.0.0.0:50075</value>
    </property>

    <!-- 配置webhdfs可用 -->
    <property> 
        <name>dfs.webhdfs.enabled</name> 
        <value>true</value> 
    </property> 
</configuration> 

Placement yarn-site.xml

$ sudo vim etc/hadoop/yarn-site.xml 
""
<configuration>
    <!-- 设置 resourcemanager 在哪个节点-->
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>192.168.71.156</value>
    </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
""

Placement mapred-site.xml

$ sudo cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
$ sudo vim etc/hadoop/mapred-site.xml
""
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
""

Configure slaves

$ vim etc/hadoop/slaves
""
192.168.71.158
""

hadoop folder distribution

distribute hadoop

Distribute the hadoop folder configured on the master server to all slave servers.

# 确保从服务器已经存在文件夹
> sudo mkdir /opt/hadoop
> sudo chown hdgs:hdgs /opt/hadoop/

# ccnu_resource为<cluster_name>,随便定吧
$ rm -rf namenode/*
$ rm -rf datanode/*
$ bin/hdfs namenode -format ccnu_resource

> scp -r /opt/hadoop hdgs@192.168.71.158:/opt/

start up

> sbin/start-dfs.sh
> sbin/start-yarn.sh

$ jps
# master
""
21298 Jps
21027 ResourceManager
20724 NameNode
""

# node
""
12656 NodeManager
12418 DataNode
12811 Jps
12557 SecondaryNameNode
""

Web:
http://192.168.71.156:50070/
http://192.168.71.156:8088/cluster

closure

> ./sbin/stop-dfs.sh
> ./sbin/stop-yarn.sh 

solution to the problem

Incompatible clusterIDs in /opt/hadoop/datanode

# 主和从服务器都执行
$ rm -rf namenode/*
$ rm -rf datanode/*

# 主服务器执行
$ bin/hdfs namenode -format ccnu_resource

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324473763&siteId=291194637