Hadoop framework: building a distributed environment in cluster mode

Source code of this article: GitHub·click here || GitEE·click here

1. Basic environment configuration

1. Three sets of services

Prepare three Centos7 services, and the basic environment is cloned from the pseudo-distributed environment.

133 hop01,134 hop02,136 hop03

2. Set the host name

## 设置名称
hostnamectl set-hostname hop01
## 重启
reboot -f

3, main name communication

vim /etc/hosts
# 添加服务节点
192.168.37.133 hop01
192.168.37.134 hop02
192.168.37.136 hop03

4. SSH password-free login

Configure three service SSH password-free login.

[root@hop01 ~]# ssh-keygen -t rsa
...一路回车结束
[root@hop01 ~]# cd .ssh
...权限分配到指定集群服务
[root@hop01 .ssh]# ssh-copy-id hop01
[root@hop01 .ssh]# ssh-copy-id hop02
[root@hop01 .ssh]# ssh-copy-id hop03
...在hop01免密登录到hop02
[root@hop01 ~]# ssh hop02

For the hop01 service, this operation must be performed on both hop02 and hop03 services.

5. Synchronize time

ntp component installation

# 安装
yum install ntpdate ntp -y
# 查看
rpm -qa|grep ntp

Basic management commands

# 查看状态
service ntpd status
# 启动
service ntpd start
# 开机启动
chkconfig ntpd on

Modify time service hop01

# 修改ntp配置
vim /etc/ntp.conf
# 添加内容
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
server 127.0.0.1
fudge 127.0.0.1 stratum 10

Modify the time mechanism of hop02\hop03, synchronize the time from hop01, and log off the network acquisition mechanism.

server 192.168.37.133
# server 0.centos.pool.ntp.org iburst
# server 1.centos.pool.ntp.org iburst
# server 2.centos.pool.ntp.org iburst
# server 3.centos.pool.ntp.org iburst

Write timed tasks

[root@hop02 ~]# crontab -e
*/10 * * * * /usr/sbin/ntpdate hop01

Modify hop02 and hop03 service time

# 指定时间
date -s "2018-05-20 13:14:55"
# 查看时间
date

In this way, the time will be continuously corrected or synchronized based on the time of the hop01 service.

6. Environmental cleaning

Clone three Centos7 services from the virtual machine in the pseudo-distributed environment and delete the data and log folders of the original hadoop environment configuration.

[root@hop02 hadoop2.7]# rm -rf data/ logs/

Two, cluster environment construction

1. Overview of cluster configuration

Service list HDFS files YARN scheduling Single service
hop01 DataNode NodeManager NameNode
hop02 DataNode NodeManager ResourceManager
hop03 DataNode NodeManager SecondaryNameNode

2. Modify the configuration

vim core-site.xml

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://hop01:9000</value>
</property>

The three services here need to specify the current host name respectively.

vim hdfs-site.xml

<property>
    <name>dfs.replication</name>
    <value>3</value>
</property>

<property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>hop03:50090</value>
</property>

Here, modify the number of replicas to 3, and specify the SecondaryNameNode service, and the three services also modify the specified SecondaryNameNode on the hop03 service.

vim yarn-site.xml

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hop02</value>
</property>

Specify the ResourceManager service on hop02.

vim mapred-site.xml

<!-- 服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hop01:10020</value>
</property>

<!-- 服务器web端地址 -->
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hop01:19888</value>
</property>

Specify the relevant web terminal to view the address on the service hop01.

3. Cluster service configuration

Path: /opt/hadoop2.7/etc/hadoop

File: vim slaves

hop01
hop02
hop03

Configure the cluster list of three services here. Synchronously modify the same configuration of other services.

4. Format the NameNode

Note that the NameNode is configured on the hop01 service.

[root@hop01 hadoop2.7]# bin/hdfs namenode -format

5. Start HDFS

[root@hop01 hadoop2.7]# sbin/start-dfs.sh
Starting namenodes on [hop01]
hop01: starting namenode
hop03: starting datanode
hop02: starting datanode
hop01: starting datanode
Starting secondary namenodes [hop03]
hop03: starting secondarynamenode

Pay attention to the printed information here, which is completely consistent with the configuration. The namenodes is started on hop01, and the secondary-namenodes is started on hop03. You can view and verify each service through the JPS command.

6. Start YARN

Note that Yarn is configured on the hop02 service, so execute the startup command on the hop02 service.

[root@hop02 hadoop2.7]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager
hop03: starting nodemanager
hop01: starting nodemanager
hop02: starting nodemanager,

Pay attention to the startup print log here, so far the services planned for the cluster have been started.

[root@hop01 hadoop2.7]# jps
4306 NodeManager
4043 DataNode
3949 NameNode
[root@hop02 hadoop2.7]# jps
3733 ResourceManager
3829 NodeManager
3613 DataNode
[root@hop03 hadoop2.7]# jps
3748 DataNode
3928 NodeManager
3803 SecondaryNameNode

Check the cluster process under each service, which is consistent with the planned configuration.

7. Web interface

NameNode:http://hop01:50070
SecondaryNameNode:http://hop03:50090

3. Source code address

GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent

Recommended reading: finishing programming system

Serial number project name GitHub address GitEE address Recommended
01 Java describes design patterns, algorithms, and data structures GitHub·click here GitEE·Click here ☆☆☆☆☆
02 Java foundation, concurrency, object-oriented, web development GitHub·click here GitEE·Click here ☆☆☆☆
03 Detailed explanation of SpringCloud microservice basic component case GitHub·click here GitEE·Click here ☆☆☆
04 SpringCloud microservice architecture actual combat comprehensive case GitHub·click here GitEE·Click here ☆☆☆☆☆
05 Getting started with SpringBoot framework basic application to advanced GitHub·click here GitEE·Click here ☆☆☆☆
06 SpringBoot framework integrates and develops common middleware GitHub·click here GitEE·Click here ☆☆☆☆☆
07 Basic case of data management, distribution, architecture design GitHub·click here GitEE·Click here ☆☆☆☆☆
08 Big data series, storage, components, computing and other frameworks GitHub·click here GitEE·Click here ☆☆☆☆☆

Guess you like

Origin blog.csdn.net/cicada_smile/article/details/108824066