Linux集群架构--用keepalived配置高可用集群

一 集群介绍

1、概述

  • 根据功能划分为两大类:高可用和负载均衡
  • 高可用集群通常为两台服务器,一台工作,另外一台作为冗余,当提供服务的机器宕机,冗余将接替继续提供服务
  • 实现高可用的开源软件有:heartbeat、keepalived
  • 负载均衡集群,需要有一台服务器作为分发器,它负责把用户的请求分发给后端的服务器处理,在这个集群里,除了分发器外,就是给用户提供服务的服务器了,这些服务器数量至少为2
  • 实现负载均衡的开源软件有LVS、keepalived、haproxy、nginx,商业的有F5、Netscaler
    在阿里、腾讯这些大公司,是不允许服务不可用的,因此在一些核心业务采用的是高可用;heartbeat在centos6中有很多bug,而且很久没有更新了;keepalived不仅有高可用的功能,还有负载均衡的功能;商业性质的负载均衡软件优势在于有很高的并发量,即使访问量很大,它也能支撑,并且它的稳定性很好;而使用开源软件来搭建负载均衡的稳定性就取决于服务器的稳定性

2、keepalived介绍

  • 在这里我们使用keepalived来实现高可用集群,因为heartbeat在centos6上有一些问题,影响实验效果
  • keepalived通过VRRP(Virtual Router Redundancy Protocl)来实现高可用。
  • 在这个协议里会将多台功能相同的路由器组成一个小组,这个小组里会有1个master角色和N(N>=1)个backup角色。
  • master会通过组播的形式向各个backup发送VRRP协议的数据包,当backup收不到master发来的VRRP数据包时,就会认为master宕机了。此时就需要根据各个backup的优先级来决定谁成为新的mater。
  • Keepalived要有三个模块,分别是core、check和vrrp。其中core模块为keepalived的核心,负责主进程的启动、维护以及全局配置文件的加载和解析,check模块负责健康检查,vrrp模块是来实现VRRP协议的。 VRRP叫做虚拟路由冗余协议

3、用keepalived配置高可用集群

  1. 准备两台机器130和136,136作为master,130作为backup
  2. 两台机器都执行yum install -y keepalived
  3. 两台机器都安装nginx,其中136上已经编译安装过nginx,130上需要yum安装nginx: yum install -y nginx
    用keepalived实现高可用,其实是让服务器上的某个服务高可用,在这里我们是让nginx实现高可用,用nginx的原因是很多企业使用nginx作为负载均衡器,在这里我们可以用以下命令来查看下130上是否已经安装niginx
# rpm -qa |grep nginx         //查看是否存在nginx的rpm包
# sudo yum install epel-release         //如果系统中找不到nginx的rpm包,则需要执行这两行命令
# yum update
# yum install -y nginx
  1. 设定虚拟IP 即vip为100
  2. 编辑136上keepalived配置文件,
    实际上/etc/keepalived/目录下是存在着keepalived.conf配置文件的,我们需要先清空keepalived.conf文件原来的内容,使用下面命令来清空:
# > /etc/keepalived/keepalived.conf

结果如下:

[root@aminglinux ~]# > !$
> /etc/keepalived/keepalived.conf
[root@aminglinux ~]# cat /etc/keepalived/keepalived.conf
[root@aminglinux ~]# 

在这里我们需要用另一个份配置文件模板,内容从这个链接https://coding.net/u/aminglinux/p/aminglinux-book/git/blob/master/D21Z/master_keepalived.conf 获取,将下面的内容写入到/etc/keepalived/keepalived.conf中去

global_defs { //这里面会定义一些全局参数
notification_email { //如果出现问题,需要给下面邮箱发送邮件 [email protected] }
notification_email_from [email protected] //定义由哪个邮箱发出邮件
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_nginx { //检测服务是否正常
script "/usr/local/sbin/check_ng.sh" //使用这个脚本来检测,检测到如果不正常,需要启动起来
interval 3 //检测的间断时间为3秒
}
vrrp_instance VI_1 { //用于定义master相关的一些东西
state MASTER //定义这台机器的角色为master,如果为从的话,值就为backup
interface ens33 //定义通过vrrp发广播的网卡
virtual_router_id 51 //定义路由器的id,与从上这个值一样
priority 100 //权重,从上的值不一样
advert_int 1
authentication { //认证相关的信息
auth_type PASS //定义认证类型为密码认证
auth_pass aminglinux>com //定义密码认证的密码
}
virtual_ipaddress { //定义一个公有的虚拟ip,作为域名解析的ip,一旦主宕机,从立即绑定这个ip
192.168.75.100
}
track_script { //加载服务
chk_nginx
}
}

  1. 136编辑监控脚本,内容从https://coding.net/u/aminglinux/p/aminglinux-book/git/blob/master/D21Z/master_check_ng.sh 获取

#!/bin/bash
#时间变量,用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0,则启动nginx,并且再次检测nginx进程数量,
#如果还为0,说明nginx无法启动,此时需要关闭keepalived
if [ $n -eq "0" ]; then
/etc/init.d/nginx start
n2=`ps -C nginx --no-heading|wc -l`
if [ $n2 -eq "0" ]; then
echo "$d nginx down,keepalived will stop" >> /var/log/check_ng.log
systemctl stop keepalived
fi
fi

  1. 给脚本755权限,如果脚本没有这个权限的话,keepalived服务就不能正常启动
[root@aminglinux ~]# chmod 755 !$    
chmod 755 /usr/local/sbin/check_ng.sh   
  1. systemctl start keepalived 136启动服务
[root@aminglinux ~]# systemctl start keepalived   
[root@aminglinux ~]# ps aux |grep keepalived  
root      1740  0.0  0.0 120740  1404 ?        Ss   01:31   0:00 /usr/sbin/keepalived -D   
root      1741  0.0  0.1 127476  3272 ?        S    01:31   0:00 /usr/sbin/keepalived -D   
root      1744  0.0  0.1 131780  3116 ?        S    01:31   0:00 /usr/sbin/keepalived -D   
root      1878  0.0  0.0 112676   988 pts/0    S+   01:32   0:00 grep --color=auto keepalived   
[root@aminglinux ~]# ps aux |grep nginx    
root      2365  0.0  0.0  20616   712 ?        Ss   01:33   0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf   
nobody    2366  0.2  0.1  23060  3292 ?        S    01:33   0:00 nginx: worker process   
nobody    2367  0.2  0.1  23060  3296 ?        S    01:33   0:00 nginx: worker process   
root      2381  0.0  0.0 112676   984 pts/0    S+   01:33   0:00 grep --color=auto nginx   

这时候如果我们将nginx停掉,keepalived的脚本会自动重启nginx

[root@aminglinux ~]# /etc/init.d/nginx stop
Stopping nginx (via systemctl):                            [  确定  ]
[root@aminglinux ~]# ps aux |grep nginx
root     15301  0.0  0.0  20616   708 ?        Ss   02:41   0:00 nginx: master process /usr/local/nginx/sbin/nginx -c /usr/local/nginx/conf/nginx.conf
nobody   15302  0.0  0.1  23060  3292 ?        S    02:41   0:00 nginx: worker process
nobody   15303  0.0  0.1  23060  3280 ?        S    02:41   0:00 nginx: worker process
root     15308  0.0  0.0 112676   984 pts/0    S+   02:41   0:00 grep --color=auto nginx
[root@aminglinux ~]# 

我们也可以使用ip addr命令来查看到ens33这个网卡上多绑定了一个IP 192.168.75.100,这个IP就是在keepalived.conf中定义的IP

[root@aminglinux ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.136/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.100/32 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.150/24 brd 192.168.75.255 scope global secondary ens33:0
       valid_lft forever preferred_lft forever
    inet6 fe80::d652:b567:6190:8f28/64 scope link 
       valid_lft forever preferred_lft forever
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:ca brd ff:ff:ff:ff:ff:ff
[root@aminglinux ~]#

下面我们再来配置从,配置从之前需要先检查下主和从上是否配置了防火墙或者selinux,需要让他们关闭

  1. 130上编辑配置文件,内容从https://coding.net/u/aminglinux/p/aminglinux-book/git/blob/master/D21Z/backup_keepalived.conf获取

global_defs {
notification_email {
[email protected]
} notification_email_from [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LVS_DEVEL
}
vrrp_script chk_nginx {
script "/usr/local/sbin/check_ng.sh"
interval 3
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90
advert_int 1
authentication {
auth_type PASS
auth_pass aminglinux>com
}
virtual_ipaddress {
192.168.188.100
}
track_script {
chk_nginx
}
}

  1. 130上编辑监控脚本/usr/local/sbin/check_ng.sh,内容从https://coding.net/u/aminglinux/p/aminglinux-book/git/blob/master/D21Z/backup_check_ng.sh获取

#时间变量,用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0,则启动nginx,并且再次检测nginx进程数量,
#如果还为0,说明nginx无法启动,此时需要关闭keepalived
if [ $n -eq "0" ]; then
systemctl start nginx
n2=`ps -C nginx --no-headig|wc -l`
if [ $n2 -eq "0" ]; then
echo "$d nginx down,keepalived will stop" >> /var/log/check_ ng.log
systemctl stop keepalived
fi
fi

  1. 给脚本755权限
[root@localhost ~]# chmod 755 !$
chmod 755 /usr/local/sbin/check_ng.sh
[root@localhost ~]# 
  1. 130上也启动服务 systemctl start keepalived
[root@localhost ~]# systemctl start keepalived
[root@localhost ~]# ps aux |grep keep
root     10928  0.2  0.0 120740  1404 ?        Ss   02:20   0:00 /usr/sbin/keepalived -D
root     10929  0.2  0.1 120740  2756 ?        S    02:20   0:00 /usr/sbin/keepalived -D
root     10930  0.2  0.1 125104  2840 ?        S    02:20   0:00 /usr/sbin/keepalived -D
root     10945  0.0  0.0 112676   984 pts/1    S+   02:20   0:00 grep --color=auto keep
[root@localhost ~]# 

配置完毕,我们来通过浏览器输入IP 192.168.75.136地址访问下主服务器,此时访问到的是/data/wwwroot/default/index.html文件,再来192.168.75.130访问从服务器,访问到的默认页面是/usr/share/nginx/html/index.html;我们再来使用vip 192.168.75.100访问,结果访问到的是主服务器的页面

4、测试高可用

  1. 先确定好两台机器上nginx差异,比如可以通过curl -I 来查看nginx版本
  2. 测试1:关闭master上的nginx服务
    在主或从上关闭nginx,keepalived都能自动重启nginx,这是依赖于我们写的监测脚本
  3. 测试2:在master上增加iptabls规则
  4. iptables -I OUTPUT -p vrrp -j DROP
[root@aminglinux ~]# iptables -I OUTPUT -p vrrp -j DROP
[root@aminglinux ~]# iptables -nvL
Chain INPUT (policy ACCEPT 165 packets, 12440 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain OUTPUT (policy ACCEPT 105 packets, 11068 bytes)
 pkts bytes target     prot opt in     out     source               destination         
   81  3240 DROP       112  --  *      *       0.0.0.0/0            0.0.0.0/0 

这里我们将主上通过vrrp协议出去的包丢弃,结果是不能达到切换主从的目的

  1. 测试3:关闭master上的keepalived服务
[root@aminglinux ~]# iptables -F
[root@aminglinux ~]# iptables -nvL
Chain INPUT (policy ACCEPT 12 packets, 840 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
Chain OUTPUT (policy ACCEPT 11 packets, 948 bytes)
 pkts bytes target     prot opt in     out     source               destination         
[root@aminglinux ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.136/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.100/32 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.150/24 brd 192.168.75.255 scope global secondary ens33:0
       valid_lft forever preferred_lft forever
    inet6 fe80::d652:b567:6190:8f28/64 scope link 
       valid_lft forever preferred_lft forever
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:ca brd ff:ff:ff:ff:ff:ff
[root@aminglinux ~]# systemctl stop keepalived
[root@aminglinux ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.136/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.150/24 brd 192.168.75.255 scope global secondary ens33:0
       valid_lft forever preferred_lft forever
    inet6 fe80::d652:b567:6190:8f28/64 scope link 
       valid_lft forever preferred_lft forever
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:ca brd ff:ff:ff:ff:ff:ff
[root@aminglinux ~]#

我们可以看到,关闭主上的keepalived服务,类似主机宕机,主机上绑定的192.168.75.100这个IP被解绑,而下面我们到从上查看ip addr,可以看到从上已经绑定了192.168.75.100这个IP,用浏览器访问这个IP,访问到的页面为从机页面,说明主从已经切换

[root@localhost ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:0c:20:c9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.130/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.100/32 scope global ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::b44e:aca4:f738:7833/64 scope link 
       valid_lft forever preferred_lft forever
[root@localhost ~]# tail /var/log/messages
Apr 10 03:04:15 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:15 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:15 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:15 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
Apr 10 03:04:20 localhost Keepalived_vrrp[12554]: Sending gratuitous ARP on ens33 for 192.168.75.100
[root@localhost ~]# 
  1. 测试4:启动master上的keepalived服务
[root@aminglinux ~]# systemctl start keepalived
[root@aminglinux ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:c0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.136/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.100/32 scope global ens33
       valid_lft forever preferred_lft forever
    inet 192.168.75.150/24 brd 192.168.75.255 scope global secondary ens33:0
       valid_lft forever preferred_lft forever
    inet6 fe80::d652:b567:6190:8f28/64 scope link 
       valid_lft forever preferred_lft forever
3: ens37: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:21:5e:ca brd ff:ff:ff:ff:ff:ff
[root@aminglinux ~]# 

然后我们再到从机上查看绑定IP

[root@localhost ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:0c:29:0c:20:c9 brd ff:ff:ff:ff:ff:ff
    inet 192.168.75.130/24 brd 192.168.75.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::b44e:aca4:f738:7833/64 scope link 
       valid_lft forever preferred_lft forever
[root@localhost ~]# 

在实际生产环境中,可能会一主多从,这时我们可以在keepalived.conf中给每个从配置不同的权重priority,权值越高,则优先级越高
除了nginx的高可用,我们还可以做mysql的高可用,如果要做mysql高可用的话,一定要保证主从数据一致

猜你喜欢

转载自my.oschina.net/u/3746774/blog/1792457