Source code of this article: GitHub·click here || GitEE·click here
1. Basic environment configuration
1. Three sets of services
Prepare three Centos7 services, and the basic environment is cloned from the pseudo-distributed environment.
133 hop01,134 hop02,136 hop03
2. Set the host name
## 设置名称
hostnamectl set-hostname hop01
## 重启
reboot -f
3, main name communication
vim /etc/hosts
# 添加服务节点
192.168.37.133 hop01
192.168.37.134 hop02
192.168.37.136 hop03
4. SSH password-free login
Configure three service SSH password-free login.
[root@hop01 ~]# ssh-keygen -t rsa
...一路回车结束
[root@hop01 ~]# cd .ssh
...权限分配到指定集群服务
[root@hop01 .ssh]# ssh-copy-id hop01
[root@hop01 .ssh]# ssh-copy-id hop02
[root@hop01 .ssh]# ssh-copy-id hop03
...在hop01免密登录到hop02
[root@hop01 ~]# ssh hop02
For the hop01 service, this operation must be performed on both hop02 and hop03 services.
5. Synchronize time
ntp component installation
# 安装
yum install ntpdate ntp -y
# 查看
rpm -qa|grep ntp
Basic management commands
# 查看状态
service ntpd status
# 启动
service ntpd start
# 开机启动
chkconfig ntpd on
Modify time service hop01
# 修改ntp配置
vim /etc/ntp.conf
# 添加内容
restrict 192.168.0.0 mask 255.255.255.0 nomodify notrap
server 127.0.0.1
fudge 127.0.0.1 stratum 10
Modify the time mechanism of hop02\hop03, synchronize the time from hop01, and log off the network acquisition mechanism.
server 192.168.37.133
# server 0.centos.pool.ntp.org iburst
# server 1.centos.pool.ntp.org iburst
# server 2.centos.pool.ntp.org iburst
# server 3.centos.pool.ntp.org iburst
Write timed tasks
[root@hop02 ~]# crontab -e
*/10 * * * * /usr/sbin/ntpdate hop01
Modify hop02 and hop03 service time
# 指定时间
date -s "2018-05-20 13:14:55"
# 查看时间
date
In this way, the time will be continuously corrected or synchronized based on the time of the hop01 service.
6. Environmental cleaning
Clone three Centos7 services from the virtual machine in the pseudo-distributed environment and delete the data and log folders of the original hadoop environment configuration.
[root@hop02 hadoop2.7]# rm -rf data/ logs/
Two, cluster environment construction
1. Overview of cluster configuration
Service list | HDFS files | YARN scheduling | Single service |
---|---|---|---|
hop01 | DataNode | NodeManager | NameNode |
hop02 | DataNode | NodeManager | ResourceManager |
hop03 | DataNode | NodeManager | SecondaryNameNode |
2. Modify the configuration
vim core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hop01:9000</value>
</property>
The three services here need to specify the current host name respectively.
vim hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hop03:50090</value>
</property>
Here, modify the number of replicas to 3, and specify the SecondaryNameNode service, and the three services also modify the specified SecondaryNameNode on the hop03 service.
vim yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hop02</value>
</property>
Specify the ResourceManager service on hop02.
vim mapred-site.xml
<!-- 服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hop01:10020</value>
</property>
<!-- 服务器web端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hop01:19888</value>
</property>
Specify the relevant web terminal to view the address on the service hop01.
3. Cluster service configuration
Path: /opt/hadoop2.7/etc/hadoop
File: vim slaves
hop01
hop02
hop03
Configure the cluster list of three services here. Synchronously modify the same configuration of other services.
4. Format the NameNode
Note that the NameNode is configured on the hop01 service.
[root@hop01 hadoop2.7]# bin/hdfs namenode -format
5. Start HDFS
[root@hop01 hadoop2.7]# sbin/start-dfs.sh
Starting namenodes on [hop01]
hop01: starting namenode
hop03: starting datanode
hop02: starting datanode
hop01: starting datanode
Starting secondary namenodes [hop03]
hop03: starting secondarynamenode
Pay attention to the printed information here, which is completely consistent with the configuration. The namenodes is started on hop01, and the secondary-namenodes is started on hop03. You can view and verify each service through the JPS command.
6. Start YARN
Note that Yarn is configured on the hop02 service, so execute the startup command on the hop02 service.
[root@hop02 hadoop2.7]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager
hop03: starting nodemanager
hop01: starting nodemanager
hop02: starting nodemanager,
Pay attention to the startup print log here, so far the services planned for the cluster have been started.
[root@hop01 hadoop2.7]# jps
4306 NodeManager
4043 DataNode
3949 NameNode
[root@hop02 hadoop2.7]# jps
3733 ResourceManager
3829 NodeManager
3613 DataNode
[root@hop03 hadoop2.7]# jps
3748 DataNode
3928 NodeManager
3803 SecondaryNameNode
Check the cluster process under each service, which is consistent with the planned configuration.
7. Web interface
NameNode:http://hop01:50070
SecondaryNameNode:http://hop03:50090
3. Source code address
GitHub·地址
https://github.com/cicadasmile/big-data-parent
GitEE·地址
https://gitee.com/cicadasmile/big-data-parent
Recommended reading: finishing programming system
Serial number | project name | GitHub address | GitEE address | Recommended |
---|---|---|---|---|
01 | Java describes design patterns, algorithms, and data structures | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
02 | Java foundation, concurrency, object-oriented, web development | GitHub·click here | GitEE·Click here | ☆☆☆☆ |
03 | Detailed explanation of SpringCloud microservice basic component case | GitHub·click here | GitEE·Click here | ☆☆☆ |
04 | SpringCloud microservice architecture actual combat comprehensive case | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
05 | Getting started with SpringBoot framework basic application to advanced | GitHub·click here | GitEE·Click here | ☆☆☆☆ |
06 | SpringBoot framework integrates and develops common middleware | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
07 | Basic case of data management, distribution, architecture design | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |
08 | Big data series, storage, components, computing and other frameworks | GitHub·click here | GitEE·Click here | ☆☆☆☆☆ |