18.1 集群介绍

Linux集群概述
根据功能划分为两大类：高可用和负载均衡
高可用集群通常为两台服务器，一台工作，另外一台作为冗余，当提供服务的机器宕机，冗余将接替继续提供服务 //通常
对于大企业来说。可用程度达到99.99%或者是5个9
实现高可用的开源软件有：heartbeat、keepalived //centos6 bug 多，而且很久没有更新了，不建议继续使用；keepalived不仅有高可用还有负载均衡
负载均衡集群，需要有一台服务器作为分发器，它负责把用户的请求分发给后端的服务器处理，在这个集群里，除了分发器外，就是给用户提供服务的服务器了，这些服务器数量至少为2
实现负载均衡的开源软件有LVS、keepalived、haproxy、nginx，商业的有F5、Netscaler
18.2 keepalived介绍
在这里我们使用keepalived来实现高可用集群，因为heartbeat在centos6上有一些问题，影响实验效果
keepalived通过VRRP（Virtual Router Redundancy Protocl 中文为：虚拟路由器冗余协议）来实现高可用。来实现高可用。
在这个协议里会将多台功能相同的路由器组成一个小组，这个小组里会有1个master角色和N（N>=1）个backup角色。
master会通过组播的形式向各个backup发送VRRP协议的数据包，当backup收不到master发来的VRRP数据包时，就会认为master宕机了。此时就需要根据各个backup的优先级来决定谁成为新的mater。
Keepalived要有三个模块，分别是core、check和vrrp。其中core模块为keepalived的核心，负责主进程的启动、维护以及全局配置文件的加载和解析，check模块负责健康检查，vrrp模块是来实现VRRP协议的。
18.3/18.4/18.5 用keepalived配置高可用集群

环境：
系统：centos7
ip:172.16.22.221 test221 master
ip:172.16.22.222 test222 backup
VIP：172.16.22.219

同时安装keepalived

# yum install -y keepalived

都安装nginx,这里已经编译安装过了

如果没有可以使用yum快速安装
编辑监控脚本

[root@test221 ~]# vim /usr/local/sbin/check_ng.sh
#!/bin/bash
#时间变量，用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0，则启动nginx，并且再次检测nginx进程数量，
#如果还为0，说明nginx无法启动，此时需要关闭keepalived
if [ $n -eq "0" ]; then
        /etc/init.d/nginx start
        n2=`ps -C nginx --no-heading|wc -l`
        if [ $n2 -eq "0"  ]; then
                echo "$d nginx down,keepalived will stop" >> /var/log/check_ng.log
                systemctl stop keepalived
        fi
fi

[root@test221 ~]# chmod +x !$
chmod +x /usr/local/sbin/check_ng.sh

编辑keepalived配置文件

[root@test221 ~]# cat /etc/keepalived/keepalived.conf       
global_defs {
   notification_email {
     [email protected]
   }
   notification_email_from root@test221
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script chk_nginx {
    script "/usr/local/sbin/check_ng.sh"
    interval 3
}

vrrp_instance VI_1 {
    state MASTER
    interface eth0
    virtual_router_id 51    
    priority 100
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass aiker>com
    }
    virtual_ipaddress {
        172.16.22.219
    }

    track_script {
        chk_nginx
    }

}

[root@test221 ~]# systemctl stop nginx
[root@test221 ~]# systemctl start keepalived.service
[root@test221 ~]# ps aux | grep nginx
root     28553  0.0  0.0   4188   348 ?        Ss   3月17   0:00 runsv nginx
root     28554  0.0  0.0   4332   348 ?        S    3月17   0:00 svlogd -tt /var/log/gitlab/nginx
root     29249  0.0  0.0  37616  3012 ?        Ss   3月17   0:00 nginx: master process /opt/gitlab/embedded/sbin/gitlab-web -p /var/opt/gitlab/nginx
gitlab-+ 29250  0.0  0.0  41708  5436 ?        S    3月17   0:00 nginx: worker process
gitlab-+ 29251  0.0  0.0  41708  5436 ?        S    3月17   0:17 nginx: worker process
gitlab-+ 29252  0.0  0.0  41708  5436 ?        S    3月17   0:17 nginx: worker process
gitlab-+ 29253  0.0  0.0  41708  5436 ?        S    3月17   0:17 nginx: worker process
gitlab-+ 29254  0.0  0.0  37768  1492 ?        S    3月17   0:01 nginx: cache manager process
root     29940  0.0  0.0 111180  9920 ?        Ss   22:39   0:00 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
www      29941  0.0  0.2 141900 34916 ?        S    22:39   0:00 nginx: worker process
www      29943  0.0  0.2 141900 34916 ?        S    22:39   0:00 nginx: worker process
www      29946  0.0  0.2 141900 34912 ?        S    22:39   0:00 nginx: worker process
www      29947  0.0  0.2 141900 34916 ?        S    22:39   0:00 nginx: worker process
www      29948  0.0  0.0 111180 10256 ?        S    22:39   0:00 nginx: cache manager process
www      29949  0.0  0.0 111180 10256 ?        S    22:39   0:00 nginx: cache loader process
root     30043  0.0  0.0 110248   900 pts/1    S+   22:39   0:00 grep --color=auto nginx

[root@test221 ~]# systemctl status keepalived.service
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:39:41 CST; 1min 22s ago
  Process: 29926 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 29927 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─29927 /usr/sbin/keepalived -D
           ├─29928 /usr/sbin/keepalived -D
           ├─29929 /usr/sbin/keepalived -D
           ├─29940 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
           ├─29941 nginx: worker process
           ├─29943 nginx: worker process
           ├─29946 nginx: worker process
           ├─29947 nginx: worker process
           └─29948 nginx: cache manager process

3月 25 22:39:42 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:42 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:42 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:42 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
[root@test221 ~]#

从配置：

[root@test222 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
   notification_email {
     [email protected]
   }
   notification_email_from root@test222
   smtp_server 127.0.0.1
   smtp_connect_timeout 30
   router_id LVS_DEVEL
}

vrrp_script chk_nginx {
    script "/usr/local/sbin/check_ng.sh"
    interval 3
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth0
    virtual_router_id 51    
    priority 90
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass aiker>com
    }
    virtual_ipaddress {
        172.16.22.219
    }

    track_script {
        chk_nginx
    }

}
~

[root@test222 ~]# mv backup_check_ng.sh /usr/local/sbin/check_ng.sh
[root@test222 ~]# vim !$
vim /usr/local/sbin/check_ng.sh
#时间变量，用于记录日志
d=`date --date today +%Y%m%d_%H:%M:%S`
#计算nginx进程数量
n=`ps -C nginx --no-heading|wc -l`
#如果进程为0，则启动nginx，并且再次检测nginx进程数量，
#如果还为0，说明nginx无法启动，此时需要关闭keepalived
if [ $n -eq "0" ]; then
        service nginx start
        n2=`ps -C nginx --no-heading|wc -l`
        if [ $n2 -eq "0"  ]; then
                echo "$d nginx down,keepalived will stop" >> /var/log/check_ng.log
                systemctl stop keepalived
        fi
fi

[root@test222 ~]# chmod  +x !$
chmod  +x /usr/local/sbin/check_ng.sh

[root@test222 ~]# service nginx stop
Stoping nginx...  done
[root@test222 ~]# ps aux | grep nginx
root     16906  0.0  0.0 110244   896 pts/2    S+   22:47   0:00 grep --color=auto nginx
[root@test222 ~]# systemctl start keepalived
[root@test222 ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:47:48 CST; 31s ago
  Process: 16913 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 16914 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─10576 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─10577 nginx: worker process                                            
           ├─10578 nginx: worker process                                            
           ├─10579 nginx: worker process                                            
           ├─10580 nginx: worker process                                            
           ├─10581 nginx: cache manager process                                     
           ├─10582 nginx: cache loader process                                      
           ├─10593 /usr/sbin/keepalived -D
           ├─10594 /usr/sbin/keepalived -D
           └─10595 /usr/sbin/keepalived -D

3月 25 22:47:48 test222 Keepalived_vrrp[16916]: Registering Kernel netlink command channel
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: Registering gratuitous ARP shared channel
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: Opening file '/etc/keepalived/keepalived.conf'.
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: Truncating auth_pass to 8 characters
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) removing protocol VIPs.
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: SECURITY VIOLATION - scripts are being executed but script_security not enabled.
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: Using LinkWatch kernel netlink reflector...
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Entering BACKUP STATE
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: VRRP sockpool: [ifindex(2), proto(112), unicast(0), fd(10,11)]
3月 25 22:47:48 test222 Keepalived_vrrp[16916]: VRRP_Script(chk_nginx) succeeded
Hint: Some lines were ellipsized, use -l to show in full.
[root@test222 ~]# ps aux | grep nginx
root     16930  0.0  0.0 108620  3180 ?        Ss   22:47   0:00 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
www      16931  0.0  0.1 139340 29248 ?        S    22:47   0:00 nginx: worker process
www      16932  0.0  0.1 139340 29248 ?        S    22:47   0:00 nginx: worker process
www      16933  0.0  0.1 139340 29248 ?        S    22:47   0:00 nginx: worker process
www      16934  0.0  0.1 139340 29248 ?        S    22:47   0:00 nginx: worker process
www      16935  0.0  0.0 108620  3868 ?        S    22:47   0:00 nginx: cache manager process
www      16936  0.0  0.0 108620  3616 ?        S    22:47   0:00 nginx: cache loader process
root     17052  0.0  0.0 110248   900 pts/2    S+   22:48   0:00 grep --color=auto nginx
[root@test222 ~]#

一、测试：master停掉keepalived

[root@test221 ~]# systemctl stop keepalived.service
[root@test221 ~]# systemctl status keepalived.service
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: inactive (dead) since 日 2018-03-25 22:51:35 CST; 6s ago
  Process: 29926 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 29927 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/keepalived.service
           ├─29940 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
           ├─29941 nginx: worker process
           ├─29943 nginx: worker process
           ├─29946 nginx: worker process
           ├─29947 nginx: worker process
           └─29948 nginx: cache manager process

3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:39:47 test221 Keepalived_vrrp[29929]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:34 test221 systemd[1]: Stopping LVS and VRRP High Availability Monitor...
3月 25 22:51:34 test221 Keepalived[29927]: Stopping
3月 25 22:51:34 test221 Keepalived_healthcheckers[29928]: Stopped
3月 25 22:51:34 test221 Keepalived_vrrp[29929]: VRRP_Instance(VI_1) sent 0 priority
3月 25 22:51:34 test221 Keepalived_vrrp[29929]: VRRP_Instance(VI_1) removing protocol VIPs.
3月 25 22:51:35 test221 Keepalived_vrrp[29929]: Stopped
3月 25 22:51:35 test221 Keepalived[29927]: Stopped Keepalived v1.3.5 (03/19,2017), git commit v1.3.5-6-g6fa32f2
3月 25 22:51:35 test221 systemd[1]: Stopped LVS and VRRP High Availability Monitor.
[root@test221 ~]#

backup:

[root@test222 ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:47:48 CST; 4min 11s ago
  Process: 16913 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 16914 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─10576 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─10577 nginx: worker process                                            
           ├─10578 nginx: worker process                                            
           ├─10579 nginx: worker process                                            
           ├─10580 nginx: worker process                                            
           ├─10581 nginx: cache manager process                                     
           ├─10582 nginx: cache loader process                                      
           ├─10593 /usr/sbin/keepalived -D
           ├─10594 /usr/sbin/keepalived -D
           └─10595 /usr/sbin/keepalived -D

3月 25 22:51:35 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:35 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:35 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:35 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:51:40 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219

backup keepalived已经启动VIP
重启master keepalived,vip重新切回到master

[root@test221 ~]# systemctl status keepalived.service
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:57:13 CST; 10min ago
  Process: 11476 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 11477 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─11477 /usr/sbin/keepalived -D
           ├─11478 /usr/sbin/keepalived -D
           ├─11479 /usr/sbin/keepalived -D
           ├─16046 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
           ├─16047 nginx: worker process
           ├─16048 nginx: worker process
           ├─16049 nginx: worker process
           ├─16050 nginx: worker process
           └─16052 nginx: cache manager process

3月 25 22:57:15 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:15 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:15 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:15 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 25 22:57:20 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219

二、master停掉Nginx

[root@test221 ~]# service nginx stop
Stoping nginx...  done
[root@test221 ~]# ps aux | grep nginx
root     16037  0.0  0.0 110244   900 pts/1    S+   23:02   0:00 grep --color=auto nginx
root     28553  0.0  0.0   4188   348 ?        Ss   3月17   0:00 runsv nginx
root     28554  0.0  0.0   4332   348 ?        S    3月17   0:00 svlogd -tt /var/log/gitlab/nginx
[root@test221 ~]# ps aux | grep nginx
root     16046  0.0  0.0 111180  4692 ?        Ss   23:02   0:00 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.conf
www      16047  0.0  0.1 141900 29232 ?        S    23:02   0:00 nginx: worker process
www      16048  0.0  0.1 141900 29232 ?        S    23:02   0:00 nginx: worker process
www      16049  0.0  0.1 141900 29232 ?        S    23:02   0:00 nginx: worker process
www      16050  0.0  0.1 141900 29232 ?        S    23:02   0:00 nginx: worker process
www      16052  0.0  0.0 111180  5028 ?        S    23:02   0:00 nginx: cache manager process
www      16055  0.0  0.0 111180  5028 ?        S    23:02   0:00 nginx: cache loader process
root     16106  0.0  0.0 110244   900 pts/1    S+   23:02   0:00 grep --color=auto nginx
root     28553  0.0  0.0   4188   348 ?        Ss   3月17   0:00 runsv nginx
root     28554  0.0  0.0   4332   348 ?        S    3月17   0:00 svlogd -tt /var/log/gitlab/nginx
[root@test221 ~]#

Nginx停止后，keepalived会自动检查重启Nginx

3、关闭master防火墙vrrp,

[root@test221 ~]# iptables -I OUTPUT -p vrrp -j DROP 
[root@test221 ~]# systemctl status keepalived.service -l
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:57:13 CST; 18h ago
  Process: 11476 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 11477 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─11477 /usr/sbin/keepalived -D
           ├─11478 /usr/sbin/keepalived -D
           ├─11479 /usr/sbin/keepalived -D
           ├─16046 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─16047 nginx: worker process                                            
           ├─16048 nginx: worker process                                            
           ├─16049 nginx: worker process                                            
           ├─16050 nginx: worker process                                            
           └─16052 nginx: cache manager process                                     

3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: VRRP_Instance(VI_1) Received advert with lower priority 90, ours 100, forcing new election
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:24:40 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219

查看backup keepalived status

[root@test222 ~]# systemctl status keepalived
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:47:48 CST; 18h ago
  Process: 16913 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 16914 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─10576 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─10577 nginx: worker process                                            
           ├─10578 nginx: worker process                                            
           ├─10579 nginx: worker process                                            
           ├─10580 nginx: worker process                                            
           ├─10581 nginx: cache manager process                                     
           ├─10582 nginx: cache loader process                                      
           ├─10593 /usr/sbin/keepalived -D
           ├─10594 /usr/sbin/keepalived -D
           └─10595 /usr/sbin/keepalived -D

3月 26 17:17:54 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:54 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:54 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:54 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219

master删除刚才添加的iptables规则

[root@test221 ~]# iptables -D OUTPUT 1
[root@test221 ~]# systemctl status keepalived.service -l
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:57:13 CST; 18h ago
  Process: 11476 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 11477 (keepalived)
   CGroup: /system.slice/keepalived.service
           ├─11477 /usr/sbin/keepalived -D
           ├─11478 /usr/sbin/keepalived -D
           ├─11479 /usr/sbin/keepalived -D
           ├─16046 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─16047 nginx: worker process                                            
           ├─16048 nginx: worker process                                            
           ├─16049 nginx: worker process                                            
           ├─16050 nginx: worker process                                            
           └─16052 nginx: cache manager process                                     

3月 26 17:33:07 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:07 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:07 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:07 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:08 test221 Keepalived_vrrp[11479]: Sending gratuitous ARP on eth0 for 172.16.22.219

backup上查看keepalived status

[root@test222 ~]# systemctl status keepalived -l
● keepalived.service - LVS and VRRP High Availability Monitor
   Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled)
   Active: active (running) since 日 2018-03-25 22:47:48 CST; 18h ago
  Process: 16913 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 16914 (keepalived)
    CGroup: /system.slice/keepalived.service
           ├─10576 nginx: master process /www/server/nginx/sbin/nginx -c /www/server/nginx/conf/nginx.con
           ├─10577 nginx: worker process                                            
           ├─10578 nginx: worker process                                            
           ├─10579 nginx: worker process                                            
           ├─10580 nginx: worker process                                            
           ├─10581 nginx: cache manager process                                     
           ├─10582 nginx: cache loader process                                      
           ├─10593 /usr/sbin/keepalived -D
           ├─10594 /usr/sbin/keepalived -D
           └─10595 /usr/sbin/keepalived -D

3月 26 17:17:54 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Sending/queueing gratuitous ARPs on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:17:59 test222 Keepalived_vrrp[16916]: Sending gratuitous ARP on eth0 for 172.16.22.219
3月 26 17:33:03 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 90
3月 26 17:33:03 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) Entering BACKUP STATE
3月 26 17:33:03 test222 Keepalived_vrrp[16916]: VRRP_Instance(VI_1) removing protocol VIPs.

扩展

heartbeat和keepalived比较

最近因为项目需要，简单的试用了两款高可用开源方案：Keepalived和Heartbeat。两者都很流行，但差异还是很大的，现将试用过程中的感受以及相关知识点简单总结一下，供大家选择方案的时候参考。

1）Keepalived使用更简单：从安装、配置、使用、维护等角度上对比，Keepalived都比Heartbeat要简单得多，尤其是Heartbeat 2.1.4后拆分成3个子项目，安装、配置、使用都比较复杂，尤其是出问题的时候，都不知道具体是哪个子系统出问题了；而Keepalived只有1个安装文件、1个配置文件，配置文件也简单很多；

2）Heartbeat功能更强大：Heartbeat虽然复杂，但功能更强大，配套工具更全，适合做大型集群管理，而Keepalived主要用于集群倒换，基本没有管理功能；

3）协议不同：Keepalived使用VRRP协议进行通信和选举，Heartbeat使用心跳进行通信和选举；Heartbeat除了走网络外，还可以通过串口通信，貌似更可靠；

4）使用方式基本类似：如果要基于两者设计高可用方案，最终都要根据业务需要写自定义的脚本，Keepalived的脚本没有任何约束，随便怎么写都可以；Heartbeat的脚本有约束，即要支持service start/stop/restart这种方式，而且Heartbeart提供了很多默认脚本，简单的绑定ip，启动apache等操作都已经有了；

使用建议：优先使用Keepalived，当Keepalived不够用的时候才选择Heartbeat

http://blog.csdn.net/yunhua_lee/article/details/9788433
DRBD工作原理和配置
一、DRBD简介

DRBD的全称为：Distributed ReplicatedBlock Device(DRBD)分布式块设备复制,DRBD是由内核模块和相关脚本而构成，用以构建高可用性的集群。其实现方式是通过网络来镜像整个设备。你可以把它看作是一种网络RAID。它允许用户在远程机器上建立一个本地块设备的实时镜像。

二、DRBD是如何工作的呢?

(DRBD Primary)负责接收数据，把数据写到本地磁盘并发送给另一台主机(DRBD Secondary)。另一个主机再将数据存到自己的磁盘中。目前，DRBD每次只允许对一个节点进行读写访问，但这对于通常的故障切换高可用集群来说已经足够用了。有可能以后的版本支持两个节点进行读写存取。

三、DRBD与HA的关系

一个DRBD系统由两个节点构成，与HA集群类似，也有主节点和备用节点之分，在带有主要设备的节点上，应用程序和操作系统可以运行和访问DRBD设备（/dev/drbd*）。在主节点写入的数据通过DRBD设备存储到主节点的磁盘设备中，同时，这个数据也会自动发送到备用节点对应的DRBD设备，最终写入备用节点的磁盘设备上，在备用节点上，DRBD只是将数据从DRBD设备写入到备用节点的磁盘中。现在大部分的高可用性集群都会使用共享存储，而DRBD也可以作为一个共享存储设备，使用DRBD不需要太多的硬件的投资。因为它在TCP/IP网络中运行，所以，利用DRBD作为共享存储设备，要节约很多成本，因为价格要比专用的存储网络便宜很多；其性能与稳定性方面也不错

四、DRBD复制模式

协议A：

异步复制协议。一旦本地磁盘写入已经完成，数据包已在发送队列中，则写被认为是完成的。在一个节点发生故障时，可能发生数据丢失，因为被写入到远程节点上的数据可能仍在发送队列。尽管，在故障转移节点上的数据是一致的，但没有及时更新。这通常是用于地理上分开的节点

协议B：

内存同步（半同步）复制协议。一旦本地磁盘写入已完成且复制数据包达到了对等节点则认为写在主节点上被认为是完成的。数据丢失可能发生在参加的两个节点同时故障的情况下，因为在传输中的数据可能不会被提交到磁盘

协议C：

同步复制协议。只有在本地和远程节点的磁盘已经确认了写操作完成，写才被认为完成。没有任何数据丢失，所以这是一个群集节点的流行模式，但I / O吞吐量依赖于网络带宽

一般使用协议C，但选择C协议将影响流量，从而影响网络时延。为了数据可靠性，我们在生产环境使用时须慎重选项使用哪一种协议

四、 DRBD工作原理图

DRBD是linux的内核的存储层中的一个分布式存储系统，可用使用DRBD在两台Linux服务器之间共享块设备，共享文件系统和数据。类似于一个网络RAID-1的功能，如图所示：

五、环境介绍及安装前准备

环境介绍：

系统版本：CentOS 6.4_x86_64

DRBD软件：drbd-8.4.3-33.el6.x86_64 drbd-kmdl-2.6.32-358.el6-8.4.3-33.el6.x86_64 下载地址：http://rpmfind.net

注意：这里两个软件的版本必须使用一致，而drbd-kmdl的版本要与当前系统的版本相对应，当然在实际应用中需要根据自己的系统平台下载符合需要的软件版本;查看系统版本 "uname -r"

安装前准备：

1、每个节点的主机名称须跟"uname -n"命令的执行结果一样

######NOD1节点执行

sed -i 's@\(HOSTNAME=\).*@\1nod1.allen.com@g' /etc/sysconfig/network
hostname nod1.allen.com

######NOD2节点执行

sed -i 's@\(HOSTNAME=\).*@\1nod2.allen.com@g' /etc/sysconfig/network
hostname nod2.allen.com

注释：修改文件须重启系统生效，这里先修改文件然后执行命令修改主机名称可以不用重启

2、两个节点的主机名称和对应的IP地址可以正常解析

######在NOD1与NOD2节点执行

cat > /etc/hosts << EOF
192.168.137.225 nod1.allen.com nod1
192.168.137.222 nod2.allen.com nod2
EOF

3、配置epel的yum源点此下载并安装

######在NOD1与NOD2节点安装

rpm -ivh epel-release-6-8.noarch.rpm

4、需要为两个节点分别提供大小相同的分区

######在NOD1节点上创建分区，分区大小必须与NOD2节点保持一样

[root@nod1 ~]# fdisk /dev/sda
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (7859-15665, default 7859):
Using default value 7859
Last cylinder, +cylinders or +size{K,M,G} (7859-15665, default 15665): +2G
Command (m for help): w

[root@nod1 ~]# partx /dev/sda #让内核重新读取分区
######查看内核有没有识别分区，如果没有需要重新启动，这里没有识别需要重启系统

[root@nod1 ~]# cat /proc/partitions
major minor  #blocks  name
   8        0  125829120 sda
   8        1     204800 sda1
   8        2   62914560 sda2
 253        0   20971520 dm-0
 253        1    2097152 dm-1
 253        2   10485760 dm-2
 253        3   20971520 dm-3
[root@nod1 ~]# reboot

######在NOD2节点上创建分区，分区大小必须与NOD1节点保持一样

[root@nod2 ~]# fdisk /dev/sda
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (7859-15665, default 7859):
Using default value 7859
Last cylinder, +cylinders or +size{K,M,G} (7859-15665, default 15665): +2G
Command (m for help): w

[root@nod2 ~]# partx /dev/sda #让内核重新读取分区
######查看内核有没有识别分区，如果没有需要重新启动，这里没有识别需要重启系统

[root@nod2 ~]# cat /proc/partitions
major minor  #blocks  name
   8        0  125829120 sda
   8        1     204800 sda1
   8        2   62914560 sda2
 253        0   20971520 dm-0
 253        1    2097152 dm-1
 253        2   10485760 dm-2
 253        3   20971520 dm-3
[root@nod2 ~]# reboot

六、安装并配置DRBD

1、在NOD1与NOD2节点上安装DRBD软件包

######NOD1

[root@nod1 ~]# ls drbd-*
drbd-8.4.3-33.el6.x86_64.rpm  drbd-kmdl-2.6.32-358.el6-8.4.3-33.el6.x86_64.rpm
[root@nod1 ~]# yum -y install drbd-*.rpm
######NOD2
[root@nod2 ~]# ls drbd-*
drbd-8.4.3-33.el6.x86_64.rpm  drbd-kmdl-2.6.32-358.el6-8.4.3-33.el6.x86_64.rpm
[root@nod2 ~]# yum -y install drbd-*.rpm

2、查看DRBD配置文件

ll /etc/drbd.conf;ll /etc/drbd.d/
-rw-r--r-- 1 root root 133 May 14 21:12 /etc/drbd.conf #主配置文件
total 4
-rw-r--r-- 1 root root 1836 May 14 21:12 global_common.conf

#全局配置文件
######查看主配置文件内容

cat /etc/drbd.conf

######主配置文件中包含了全局配置文件及"drbd.d/"目录下以.res结尾的文件

# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res";

3、修改配置文件如下：

[root@nod1 ~]#vim /etc/drbd.d/global_common.conf
global {
    usage-count no;

#是否参加DRBD使用统计，默认为yes

# minor-count dialog-refresh disable-ip-verification
}
common {
    protocol C;      #使用DRBD的同步协议
    handlers {
        # These are EXAMPLE handlers only.
        # They may have severe implications,
        # like hard resetting the node under certain circumstances.
        # Be careful when chosing your poison.
        pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
        local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
        # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
        # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
        # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
        # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
        # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
    }
    startup {
        # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
    }
    options {
        # cpu-mask on-no-data-accessible
    }
    disk {
        on-io-error detach; #配置I/O错误处理策略为分离
        # size max-bio-bvecs on-io-error fencing disk-barrier disk-flushes
        # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
    }
    net {
        cram-hmac-alg "sha1";       #设置加密算法
        shared-secret "allendrbd"; #设置加密密钥
        # protocol timeout max-epoch-size max-buffers unplug-watermark
        # connect-int ping-int sndbuf-size rcvbuf-size ko-count
        # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
        # after-sb-1pri after-sb-2pri always-asbp rr-conflict
        # ping-timeout data-integrity-alg tcp-cork on-congestion
        # congestion-fill congestion-extents csums-alg verify-alg
        # use-rle
    }
    syncer {
        rate 1024M;    #设置主备节点同步时的网络速率
    }
}

注释： on-io-error <strategy>策略可能为以下选项之一

detach 分离：这是默认和推荐的选项，如果在节点上发生底层的硬盘I/O错误，它会将设备运行在Diskless无盘模式下

pass_on：DRBD会将I/O错误报告到上层，在主节点上，它会将其报告给挂载的文件系统，但是在此节点上就往往忽略（因此此节点上没有可以报告的上层）

-local-in-error：调用本地磁盘I/O处理程序定义的命令；这需要有相应的local-io-error调用的资源处理程序处理错误的命令；这就给管理员有足够自由的权力命令命令或是脚本调用local-io-error处理I/O错误

4、添加资源文件:

[root@nod1 ~]# vim /etc/drbd.d/drbd.res
resource drbd {
  on nod1.allen.com {    #第个主机说明以on开头，后面是主机名称
    device    /dev/drbd0;#DRBD设备名称
    disk      /dev/sda3; #drbd0使用的磁盘分区为"sda3"
    address   192.168.137.225:7789; #设置DRBD监听地址与端口
    meta-disk internal;
  }
  on nod2.allen.com {
    device    /dev/drbd0;
    disk      /dev/sda3;
    address   192.168.137.222:7789;
    meta-disk internal;
  }
}

5、将配置文件为NOD2提供一份

[root@nod1 ~]# scp /etc/drbd.d/{global_common.conf,drbd.res} nod2:/etc/drbd.d/
The authenticity of host 'nod2 (192.168.137.222)' can't be established.
RSA key fingerprint is 29:d3:28:85:20:a1:1f:2a:11:e5:88:cd:25:d0:95:c7.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'nod2' (RSA) to the list of known hosts.
root@nod2's password:
global_common.conf                                                             100% 1943     1.9KB/s   00:00 
drbd.res                                                                       100%  318     0.3KB/s   00:00

6、初始化资源并启动服务

######在NOD1节点上初始化资源并启动服务

[root@nod1 ~]# drbdadm create-md drbd
Writing meta data...
initializing activity log
NOT initializing bitmap
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory
New drbd meta data block successfully created.  #提示已经创建成功
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory

######启动服务

[root@nod1 ~]# service drbd start
Starting DRBD resources: [
     create res: drbd
   prepare disk: drbd
    adjust disk: drbd
     adjust net: drbd
]
..........

DRBD's startup script waits for the peer node(s) to appear.
 - In case this node was already a degraded cluster before the
   reboot the timeout is 0 seconds. [degr-wfc-timeout]
 - If the peer was available before the reboot the timeout will
   expire after 0 seconds. [wfc-timeout]
   (These values are for resource 'drbd'; 0 sec -> wait forever)
 To abort waiting enter 'yes' [  12]: yes

######查看监听端口

[root@nod1 ~]# ss -tanl |grep 7789
LISTEN     0      5           192.168.137.225:7789                     *:*

######在NOD2节点上初始化资源并启动服务

[root@nod2 ~]# drbdadm create-md drbd
Writing meta data...
initializing activity log
NOT initializing bitmap
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory
New drbd meta data block successfully created.
lk_bdev_save(/var/lib/drbd/drbd-minor-0.lkbd) failed: No such file or directory

######启动服务

[root@nod2 ~]# service drbd start
Starting DRBD resources: [
     create res: drbd
   prepare disk: drbd
    adjust disk: drbd
     adjust net: drbd
]

######查看监听地址与端口

[root@nod2 ~]# netstat -anput|grep 7789
tcp        0      0 192.168.137.222:42345       192.168.137.225:7789        ESTABLISHED -                
tcp        0      0 192.168.137.222:7789        192.168.137.225:42325       ESTABLISHED -

######查看DRBD启动状态

[root@nod2 ~]# drbd-overview
  0:drbd/0  Connected Secondary/Secondary Inconsistent/Inconsistent C r-----

7、资源的连接状态详细介绍

7.1、如何查看资源连接状态？

[root@nod1 ~]# drbdadm cstate drbd   #drbd为资源名称
Connected

7.2、资源的连接状态；一个资源可能有以下连接状态中的一种

StandAlone 独立的：网络配置不可用；资源还没有被连接或是被管理断开（使用 drbdadm disconnect
命令），或是由于出现认证失败或是脑裂的情况

Disconnecting 断开：断开只是临时状态，下一个状态是StandAlone独立的

Unconnected 悬空：是尝试连接前的临时状态，可能下一个状态为WFconnection和WFReportParams

Timeout 超时：与对等节点连接超时，也是临时状态，下一个状态为Unconected悬空

BrokerPipe：与对等节点连接丢失，也是临时状态，下一个状态为Unconected悬空

NetworkFailure：与对等节点推动连接后的临时状态，下一个状态为Unconected悬空

ProtocolError：与对等节点推动连接后的临时状态，下一个状态为Unconected悬空

TearDown 拆解：临时状态，对等节点关闭，下一个状态为Unconected悬空

WFConnection：等待和对等节点建立网络连接

WFReportParams：已经建立TCP连接，本节点等待从对等节点传来的第一个网络包

Connected 连接：DRBD已经建立连接，数据镜像现在可用，节点处于正常状态

StartingSyncS：完全同步，有管理员发起的刚刚开始同步，未来可能的状态为SyncSource或PausedSyncS

StartingSyncT：完全同步，有管理员发起的刚刚开始同步，下一状态为WFSyncUUID

WFBitMapS：部分同步刚刚开始，下一步可能的状态为SyncSource或PausedSyncS

WFBitMapT：部分同步刚刚开始，下一步可能的状态为WFSyncUUID

WFSyncUUID：同步即将开始，下一步可能的状态为SyncTarget或PausedSyncT

SyncSource：以本节点为同步源的同步正在进行

SyncTarget：以本节点为同步目标的同步正在进行

PausedSyncS：以本地节点是一个持续同步的源，但是目前同步已经暂停，可能是因为另外一个同步正在进行或是使用命令(drbdadm
pause-sync)暂停了同步

PausedSyncT：以本地节点为持续同步的目标，但是目前同步已经暂停，这可以是因为另外一个同步正在进行或是使用命令(drbdadm
pause-sync)暂停了同步

VerifyS：以本地节点为验证源的线上设备验证正在执行

VerifyT：以本地节点为验证目标的线上设备验证正在执行

7.3、资源角色

查看资源角色命令

[root@nod1 ~]# drbdadm role drbd
Secondary/Secondary
[root@nod1 ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-05-27 04:30:21
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:2103412

注释：

Parimary 主：资源目前为主，并且可能正在被读取或写入，如果不是双主只会出现在两个节点中的其中一个节点上 Secondary
次：资源目前为次，正常接收对等节点的更新 Unknown 未知：资源角色目前未知，本地的资源不会出现这种状态

7.4、硬盘状态

查看硬盘状态命令

[root@nod1 ~]# drbdadm dstate drbd
Inconsistent/Inconsistent

本地和对等节点的硬盘有可能为下列状态之一：

Diskless
无盘：本地没有块设备分配给DRBD使用，这表示没有可用的设备，或者使用drbdadm命令手工分离或是底层的I/O错误导致自动分离

Attaching：读取无数据时候的瞬间状态

Failed 失败：本地块设备报告I/O错误的下一个状态，其下一个状态为Diskless无盘

Negotiating：在已经连接的DRBD设置进行Attach读取无数据前的瞬间状态

Inconsistent：数据是不一致的，在两个节点上（初始的完全同步前）这种状态出现后立即创建一个新的资源。此外，在同步期间（同步目标）在一个节点上出现这种状态

Outdated：数据资源是一致的，但是已经过时

DUnknown：当对等节点网络连接不可用时出现这种状态

Consistent：一个没有连接的节点数据一致，当建立连接时，它决定数据是UpToDate或是Outdated

UpToDate：一致的最新的数据状态，这个状态为正常状态

7.5、启用和禁用资源

######手动启用资源

drbdadm up <resource>

######手动禁用资源

drbdadm down <resource>

注释：
resource：为资源名称；当然也可以使用all表示[停用|启用]所有资源
7.6、升级和降级资源

######升级资源

drbdadm primary <resource>

######降级资源

drbdadm secondary <resource>

注释：在单主模式下的DRBD，两个节点同时处于连接状态，任何一个节点都可以在特定的时间内变成主；但两个节点中只能一为主，如果已经有一个主，需先降级才可能升级；在双主模式下没有这个限制
8、初始化设备同步

8.1、选择一个初始同步源；如果是新初始化的或是空盘，这个选择可以是任意的，但是如果其中的一个节点已经在使用并包含有用的数据，那么选择同步源是至关重要的；如果选错了初始化同步方向，就会造成数据丢失，因此需要十分小心

8.2、启动初始化完全同步，这一步只能在初始化资源配置的一个节点上进行，并作为同步源选择的节点上；命令如下：

[root@nod1 ~]# drbdadm -- --overwrite-data-of-peer primary drbd
[root@nod1 ~]# cat /proc/drbd     #查看同步进度
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-05-27 04:30:21
 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---n-
    ns:1897624 nr:0 dw:0 dr:1901216 al:0 bm:115 lo:0 pe:3 ua:3 ap:0 ep:1 wo:f oos:207988
    [=================>..] sync'ed: 90.3% (207988/2103412)K
    finish: 0:00:07 speed: 26,792 (27,076) K/sec

######当同步完成时如以下状态

version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-05-27 04:30:21
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:2103412 nr:0 dw:0 dr:2104084 al:0 bm:129 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

注释： drbd：为资源名称
######查看同步进度也可使用以下命令

drbd-overview

9、创建文件系统

9.1、文件系统只能挂载在主(Primary)节点上，因此在设置好主节点后才可以对DRBD设备进行格式化操作

######格式化文件系统

[root@nod1 ~]# mkfs.ext4 /dev/drbd0

######挂载文件系统

[root@nod1 ~]# mount /dev/drbd0 /mnt/

######查看挂载

[root@nod1 ~]# mount |grep drbd0
/dev/drbd0 on /mnt type ext4 (rw)

注释：
"/dev/drbd0"为资源中定义已定义的资源名称
######查看DRBD状态

[root@nod1 ~]# drbd-overview
  0:drbd/0  Connected Primary/Secondary UpToDate/UpToDate C r-----

注释：

Primary：当前节点为主；在前面为当前节点
Secondary：备用节点为次

9.2、在挂载目录中创建一个测试文件并卸载；然后

[root@nod1 ~]# mkdir /mnt/test
[root@nod1 ~]# ls /mnt/
lost+found  test

######在切换主节点时必须保证资源不在使用

[root@nod1 ~]# umount /mnt/

9.3、切换主备节点

######先把当前主节点降级为次

[root@nod1 ~]# drbdadm secondary drbd

######查看DRBD状态

[root@nod1 ~]# drbd-overview
  0:drbd/0  Connected Secondary/Secondary UpToDate/UpToDate C r-----

######在NOD2节点升级

[root@nod2 ~]# drbdadm primary drbd

######查看DRBD状态

[root@nod2 ~]# drbd-overview
  0:drbd/0  Connected Primary/Secondary UpToDate/UpToDate C r-----

9.4、挂载设备并验证文件是否存在

[root@nod2 ~]# mount /dev/drbd0 /mnt/
[root@nod2 ~]# ls /mnt/
lost+found  test

七、DRBD脑裂的模拟及修复

注释：我们还接着上面的实验继续进行，现在NOD2为主节点而NOD1为备节点

1、断开主(parmary)节点；关机、断开网络或重新配置其他的IP都可以；这里选择的是断开网络

2、查看两节点状态

[root@nod2 ~]# drbd-overview
  0:drbd/0  WFConnection Primary/Unknown UpToDate/DUnknown C r----- /mnt ext4 2.0G 68M 1.9G 4%
[root@nod1 ~]# drbd-overview
  0:drbd/0  StandAlone Secondary/Unknown UpToDate/DUnknown r-----

######由上可以看到两个节点已经无法通信；NOD2为主节点，NOD1为备节点
3、将NOD1节点升级为主(primary)节点并挂载资源

[root@nod1 ~]# drbdadm primary drbd
[root@nod1 ~]# drbd-overview
  0:drbd/0  StandAlone Primary/Unknown UpToDate/DUnknown r-----
[root@nod1 ~]# mount /dev/drbd0 /mnt/
[root@nod1 ~]# mount | grep drbd0
/dev/drbd0 on /mnt type ext4 (rw)

4、假如原来的主(primary)节点修复好重新上线了，这时出现了脑裂情况

[root@nod2 ~]# tail -f /var/log/messages
Sep 19 01:56:06 nod2 kernel: d-con drbd: Terminating drbd_a_drbd
Sep 19 01:56:06 nod2 kernel: block drbd0: helper command: /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
Sep 19 01:56:06 nod2 kernel: block drbd0: Split-Brain detected but unresolved, dropping connection!
Sep 19 01:56:06 nod2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0
Sep 19 01:56:06 nod2 kernel: block drbd0: helper command: /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
Sep 19 01:56:06 nod2 kernel: d-con drbd: conn( NetworkFailure -> Disconnecting )
Sep 19 01:56:06 nod2 kernel: d-con drbd: error receiving ReportState, e: -5 l: 0!
Sep 19 01:56:06 nod2 kernel: d-con drbd: Connection closed
Sep 19 01:56:06 nod2 kernel: d-con drbd: conn( Disconnecting -> StandAlone )
Sep 19 01:56:06 nod2 kernel: d-con drbd: receiver terminated
Sep 19 01:56:06 nod2 kernel: d-con drbd: Terminating drbd_r_drbd
Sep 19 01:56:18 nod2 kernel: block drbd0: role( Primary -> Secondary )

5、再次查看两节点的状态

[root@nod1 ~]# drbdadm role drbd
Primary/Unknown
[root@nod2 ~]# drbdadm role drbd
Primary/Unknown

6、查看NOD1与NOD2连接状态

[root@nod1 ~]# drbd-overview
  0:drbd/0  StandAlone Primary/Unknown UpToDate/DUnknown r----- /mnt ext4 2.0G 68M 1.9G 4%
[root@nod2 ~]# drbd-overview
  0:drbd/0  WFConnection Primary/Unknown UpToDate/DUnknown C r----- /mnt ext4 2.0G 68M 1.9G 4%

######由上可见，状态为StandAlone时，主备节点是不会通信的
7、查看DRBD的服务状态

[root@nod1 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-05-27 04:30:21
m:res   cs          ro               ds                 p       mounted  fstype
0:drbd  StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
[root@nod2 ~]# service drbd status
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by gardner@, 2013-05-27 04:30:21
m:res   cs            ro               ds                 p  mounted  fstype
0:drbd  WFConnection  Primary/Unknown  UpToDate/DUnknown  C  /mnt     ext4

8、在NOD1备用节点处理办法

[root@nod1 ~]# umount /mnt/
[root@nod1 ~]# drbdadm disconnect drbd
drbd: Failure: (162) Invalid configuration request
additional info from kernel:
unknown connection
Command 'drbdsetup disconnect ipv4:192.168.137.225:7789 ipv4:192.168.137.222:7789' terminated with exit code 10
[root@nod1 ~]# drbdadm secondary drbd
[root@nod1 ~]# drbd-overview
  0:drbd/0  StandAlone Secondary/Unknown UpToDate/DUnknown r-----
[root@nod1 ~]# drbdadm connect --discard-my-data drbd
######执行完以上三步后，你查看会发现还是不可用
[root@nod1 ~]# drbd-overview
  0:drbd/0  WFConnection Secondary/Unknown UpToDate/DUnknown C r-----

9、需要在NOD2节点上重新建立连接资源

[root@nod2 ~]# drbdadm connect drbd

######查看节点连接状态

[root@nod2 ~]# drbd-overview
  0:drbd/0  Connected Primary/Secondary UpToDate/UpToDate C r----- /mnt ext4 2.0G 68M 1.9G 4%
[root@nod1 ~]# drbd-overview
  0:drbd/0  Connected Secondary/Primary UpToDate/UpToDate C r-----

######由上可见已经恢复到正常运行状态
注意：特别提醒，如果是单主模式，资源只能在主(Primary)节点上挂载使用，而且不建议手动切换主备节点

到此DRBD的安装配置及故障修复已结束，DRBD的双主模式一般情况不会用到，这里也不再介绍双主模式的配置；这篇博客写于中秋节当天，在这里祝大家中秋节愉快！！！
http://502245466.blog.51cto.com/7559397/1298945

mysql+keepalive

环境描述：

OS：CentOS6.5_X64

MASTER：192.168.0.202

BACKUP：192.168.0.203

VIP：192.168.0.204

1、配置两台Mysql主主同步
[root@master ~]``# yum install mysql-server mysql -y

[root@master ~]``# service mysqld start

[root@master ~]``# mysqladmin -u root password 123.com

[root@master ~]``# vi /etc/my.cnf #开启二进制日志，设置id

[mysqld]

server-``id = 1 #backup这台设置2

log-bin = mysql-bin

binlog-ignore-db = mysql,information_schema #忽略写入binlog日志的库

auto-increment-increment = 2 #字段变化增量值

auto-increment-offset = 1 #初始字段ID为1

slave-skip-errors = all #忽略所有复制产生的错误

[root@master ~]``# service mysqld restart
#先查看下log bin日志和pos值位置

master配置如下：


[root@ master ~]# mysql -u root -p123.com
mysql> GRANT  REPLICATION SLAVE ON *.* TO 'replication'@'192.168.0.%' IDENTIFIED  BY 'replication';
mysql> flush  privileges;
mysql> change  master to
    ->  master_host='192.168.0.203',
    ->  master_user='replication',
    ->  master_password='replication',
    ->  master_log_file='mysql-bin.000002',
    ->  master_log_pos=106;  #对端状态显示的值
mysql> start  slave;         #启动同步

backup配置如下：


[root@backup ~]#  mysql -u root -p123.com
mysql> GRANT  REPLICATION SLAVE ON *.* TO 'replication'@'192.168.0.%' IDENTIFIED  BY 'replication';
mysql> flush  privileges;
mysql> change  master to
    ->  master_host='192.168.0.202',
    ->  master_user='replication',
    ->  master_password='replication',
    ->  master_log_file='mysql-bin.000002',
    ->  master_log_pos=106;
mysql> start  slave;

#主主同步配置完毕，查看同步状态Slave_IO和Slave_SQL是YES说明主主同步成功。

在master插入数据测试下：

在backup查看是否同步成功：

可以看到已经成功同步过去，同样在backup插入到user表数据，一样同步过去，双主就做成功了。

2、配置keepalived实现热备

[root@backup ~]# yum install -y pcre-devel openssl-devel popt-devel #安装依赖包
[root@master ~]# wget http://www.keepalived.org/software/keepalived-1.2.7.tar.gz
[root@master ~]# tar zxvf keepalived-1.2.7.tar.gz
[root@master ~]# cd keepalived-1.2.7
[root@master ~]#./configure --prefix=/usr/local/keepalived
make && make install

#将keepalived配置成系统服务

[root@master ~]# cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/
[root@master ~]# cp /usr/local/keepalived/etc/sysconfig/keepalived /etc/sysconfig/
[root@master ~]# mkdir /etc/keepalived/
[root@master ~]# cp /usr/local/keepalived/etc/keepalived/keepalived.conf /etc/keepalived/
[root@master ~]# cp /usr/local/keepalived/sbin/keepalived /usr/sbin/

[root@master ~]# vi /etc/keepalived/keepalived.conf
! Configuration File forkeepalived
global_defs {
notification_email {
[email protected]
 }
notification_email_from  [email protected]
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id MYSQL_HA      #标识，双主相同
 }
vrrp_instance VI_1 {
 state BACKUP           #两台都设置BACKUP
 interface eth0
 virtual_router_id 51       #主备相同
 priority 100           #优先级，backup设置90
 advert_int 1
 nopreempt             #不主动抢占资源，只在master这台优先级高的设置，backup不设置
 authentication {
 auth_type PASS
 auth_pass 1111
 }
 virtual_ipaddress {
 192.168.0.204
 }
}
virtual_server 192.168.0.204 3306 {
 delay_loop 2
 #lb_algo rr              #LVS算法，用不到，我们就关闭了
 #lb_kind DR              #LVS模式，如果不关闭，备用服务器不能通过VIP连接主MySQL
 persistence_timeout 50  #同一IP的连接60秒内被分配到同一台真实服务器
 protocol TCP
 real_server 192.168.0.202 3306 {   #检测本地mysql，backup也要写检测本地mysql
 weight 3
 notify_down /usr/local/keepalived/mysql.sh    #当mysq服down时，执行此脚本，杀死keepalived实现切换
 TCP_CHECK {
 connect_timeout 3    #连接超时
 nb_get_retry 3       #重试次数
 delay_before_retry 3 #重试间隔时间
  }
}

[root@master ~]# vi /usr/local/keepalived/mysql.sh
#!/bin/bash
pkill keepalived
[root@master ~]# chmod +x /usr/local/keepalived/mysql.sh
[root@master ~]# /etc/init.d/keepalived start

#backup服务器只修改priority为90、nopreempt不设置、real_server设置本地IP。

#授权两台Mysql服务器允许root远程登录，用于在其他服务器登陆测试！

mysql> grant all on *.* to'root'@'192.168.0.%' identified by '123.com';

mysql> flush privileges;

3、测试高可用性

1、通过Mysql客户端通过VIP连接，看是否连接成功。

2、停止master这台mysql服务，是否能正常切换过去，可通过ip addr命令来查看VIP在哪台服务器上。

3、可通过查看/var/log/messges日志，看出主备切换过程

4、master服务器故障恢复后，是否主动抢占资源，成为活动服务器。
http://lizhenliang.blog.51cto.com/7876557/1362313

7月3日任务

18.1 集群介绍

18.2 keepalived介绍

18.3/18.4/18.5 用keepalived配置高可用集群

扩展

heartbeat和keepalived比较

mysql+keepalive

猜你喜欢