Keepalived 是一种高性能的服务器高可用或热备解决方案, Keepalived 可以用来防止服务器单点故障的发生,通过配合 Nginx 可以实现 web 前端服务的高可用。
Keepalived 以 VRRP 协议为实现基础,用 VRRP 协议来实现高可用性(HA)。 VRRP(Virtual RouterRedundancy Protocol)协议是用于实现路由器冗余的协议, VRRP 协议将两台或多台路由器设备虚拟成一个设备,对外提供虚拟路由器 IP(一个或多个),而在路由器组内部,如果实际拥有这个对外 IP 的路由器如果工作正常的话就是 MASTER,或者是通过算法选举产生, MASTER 实现针对虚拟路由器 IP 的各种网络功能,如 ARP 请求, ICMP,以及数据的转发等;其他设备不拥有该虚拟 IP,状态是 BACKUP,除了接收 MASTER 的VRRP 状态通告信息外,不执行对外的网络功能。当主机失效时, BACKUP 将接管原先 MASTER 的网络功能。VRRP 协议使用多播数据来传输 VRRP 数据, VRRP 数据使用特殊的虚拟源 MAC 地址发送数据而不是自身网卡的 MAC 地址, VRRP 运行时只有 MASTER 路由器定时发送 VRRP 通告信息,表示 MASTER 工作正常以及虚拟路由器 IP(组), BACKUP 只接收 VRRP 数据,不发送数据,如果一定时间内没有接收到 MASTER 的通告信息,各 BACKUP 将宣告自己成为 MASTER,发送通告信息,重新进行 MASTER 选举状态。
ip规划如下:定义VIP为:172.16.23.132
nginx1:172.16.23.129 keepalived:172.16.23.129
nginx2:172.16.23.130 keepalived:172.16.23.130
httpd1:172.16.23.128
httpd2:172.16.23.131
上面规划中nginx只提供负载均衡作用,并不实现web访问功能:
[root@master ~]# cat /etc/ansible/hosts|grep "^\[nodes" -A 2 [nodes] 172.16.23.129 172.16.23.130
查看nginx服务状态:
[root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running Active: active (running) since 二 2018-12-18 16:33:04 CST; 12min ago Active: active (running) since 二 2018-12-18 16:35:51 CST; 10min ago
首先nginx服务正常开启,然后查看后端服务httpd:
[root@master ~]# cat /etc/ansible/hosts|grep "^\[backend_nodes" -A 2 [backend_nodes] 172.16.23.128 172.16.23.131
查看httpd服务状态:
[root@master ~]# ansible backend_nodes -m shell -a "systemctl status httpd"|grep running Active: active (running) since 二 2018-12-18 16:29:36 CST; 22min ago Active: active (running) since 二 2018-12-18 16:30:03 CST; 21min ago
然后在nginx两台服务器上分别测试负载均衡效果:
[root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html" 172.16.23.129 | CHANGED | rc=0 >> 172.16.23.128 [root@master ~]# ansible 172.16.23.129 -m get_url -a "url=http://172.16.23.129/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.129 -m shell -a "cat /tmp/index.html" 172.16.23.129 | CHANGED | rc=0 >> 172.16.23.131
由上面可以看出nginx1:172.16.23.129上进行测试返回后端httpd服务的web页面:172.16.23.128以及172.16.23.131,测试访问没有问题,负载均衡没有问题
[root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html" 172.16.23.130 | CHANGED | rc=0 >> 172.16.23.128 [root@master ~]# ansible 172.16.23.130 -m get_url -a "url=http://172.16.23.130/index.html dest=/tmp"|grep status_code "status_code": 200, [root@master ~]# ansible 172.16.23.130 -m shell -a "cat /tmp/index.html" 172.16.23.130 | CHANGED | rc=0 >> 172.16.23.131
由上面可以看见nginx2服务访问后端httpd服务也是完全OK的,于是nginx两台服务负载均衡效果达到,现在在nginx两台服务器上安装keepalived服务:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:06:38 CST; 52min ago Active: active (running) since 二 2018-12-18 16:05:04 CST; 54min ago
查看VIP信息:发现vip在node1节点上
[root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130
可以看出VIP落在了nginx1也就是node1节点上,然后通过访问vip看看负载均衡效果:
[root@master ~]# curl http://172.16.23.132 172.16.23.131 [root@master ~]# curl http://172.16.23.132 172.16.23.128
由上面返回结果看,没有任何问题,现在摘掉一台nginx服务器,看看keepalived情况,以及访问vip的情况:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop nginx" 172.16.23.130 | CHANGED | rc=0 >>
查看keepalived服务状态,查看vip信息:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:05:04 CST; 1h 4min ago Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 3min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'" 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132
vip信息没有漂移,keepalived服务状态正常,现在访问vip:
[root@master ~]# curl http://172.16.23.132 172.16.23.128 [root@master ~]# curl http://172.16.23.132 172.16.23.131
通过vip访问web服务没有问题
现在将nginx服务开启,端掉一个节点的keepalived服务:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start nginx" 172.16.23.130 | CHANGED | rc=0 >> [root@master ~]# ansible nodes -m shell -a "systemctl status nginx"|grep running Active: active (running) since 二 2018-12-18 17:15:48 CST; 18s ago Active: active (running) since 二 2018-12-18 16:33:04 CST; 43min ago
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl stop keepalived" 172.16.23.130 | CHANGED | rc=0 >>
然后在该节点日志查看如下:tail -f /var/log/message
Dec 18 17:16:50 node2 systemd: Stopping LVS and VRRP High Availability Monitor... Dec 18 17:16:50 node2 Keepalived[12981]: Stopping Dec 18 17:16:50 node2 Keepalived_healthcheckers[12982]: Stopped Dec 18 17:16:51 node2 Keepalived_vrrp[12983]: Stopped Dec 18 17:16:51 node2 Keepalived[12981]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:16:52 node2 systemd: Stopped LVS and VRRP High Availability Monitor.
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 10min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'" 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132
由于断掉的是nginx2也就是node2节点的keepalived服务,所以vip还是在node1上,并没有漂移在node2,查看node1和node2节点上keepalived服务的配置文件:
[root@master ~]# ansible nodes -m shell -a "cat /etc/keepalived/keepalived.conf" 172.16.23.129 | CHANGED | rc=0 >> ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server smtp.163.com smtp_connect_timeout 30 router_id test } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 100 nopreempt # 非抢占模式 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.16.23.132/24 dev ens33 } } 172.16.23.130 | CHANGED | rc=0 >> ! Configuration File for keepalived global_defs { notification_email { [email protected] } notification_email_from [email protected] smtp_server smtp.163.com smtp_connect_timeout 30 router_id test } vrrp_instance VI_1 { state BACKUP interface ens33 virtual_router_id 51 priority 99 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.16.23.132/24 dev ens33 } }
可以由配置看出,只有优先级不一样以及node1节点设置了nopreempt # 非抢占模式,现在将node2节点的keepalived服务开启,然后将node1节点的keepalived服务关掉,看看vip信息:
[root@master ~]# ansible 172.16.23.130 -m shell -a "systemctl start keepalived" 172.16.23.130 | CHANGED | rc=0 >>
查看node2日志:
Dec 18 17:23:14 node2 systemd: Starting LVS and VRRP High Availability Monitor... Dec 18 17:23:14 node2 Keepalived[15994]: Starting Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2 Dec 18 17:23:14 node2 Keepalived[15994]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:14 node2 Keepalived[15995]: Starting Healthcheck child process, pid=15996 Dec 18 17:23:14 node2 Keepalived_healthcheckers[15996]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:14 node2 Keepalived[15995]: Starting VRRP child process, pid=15997 Dec 18 17:23:14 node2 systemd: Started LVS and VRRP High Availability Monitor. Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink reflector Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering Kernel netlink command channel Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Registering gratuitous ARP shared channel Dec 18 17:23:14 node2 Keepalived_vrrp[15997]: Opening file '/etc/keepalived/keepalived.conf'. Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) removing protocol VIPs. Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: Using LinkWatch kernel netlink reflector... Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP_Instance(VI_1) Entering BACKUP STATE Dec 18 17:23:24 node2 Keepalived_vrrp[15997]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
两节点keepalived服务状态,以及vip信息:
[root@master ~]# ansible nodes -m shell -a "systemctl status keepalived"|grep running Active: active (running) since 二 2018-12-18 17:23:14 CST; 56s ago Active: active (running) since 二 2018-12-18 16:06:38 CST; 1h 17min ago [root@master ~]# ansible nodes -m shell -a "hostname;ip a|grep ens33|grep -Po '(?<=inet ).*(?=\/)'" 172.16.23.129 | CHANGED | rc=0 >> node1 172.16.23.129 172.16.23.132 172.16.23.130 | CHANGED | rc=0 >> node2 172.16.23.130