Article Directory
1. Introduction to Pacemaker
1.1 Introduction to Pacemaker
Pacemaker is the most widely used open source cluster resource manager in the Linux environment. Pacemaker uses the message and cluster member management functions provided by the cluster infrastructure (Corosync or Heartbeat) to achieve node and resource level failure detection and resource recovery, thereby ensuring maximum assurance High availability of cluster services. In terms of logical function, pacemaker is responsible for the full life cycle management of software services in the cluster, driven by the resource rules defined by the cluster administrator. This management even includes the entire software system and the interaction between the software systems. Pacemaker can manage clusters of any size in practical applications. Because of its powerful resource dependency model, this enables cluster administrators to accurately describe and express the relationship between cluster resources (including the order and location of resources). At the same time, for any form of software resources, almost all of them can be managed by Pacemaker as resource objects by customizing resource startup and management scripts (resource agents). In addition, it should be pointed out that Pacemaker is only a resource manager and does not provide cluster heartbeat information. Since any highly available cluster must have a heartbeat monitoring mechanism, many beginners will always mistakenly believe that Pacemaker itself has a heartbeat detection function. The heartbeat mechanism of Pacemaker is mainly based on Corosync or Heartbeat.
- pcsd.service:service name
1.2 Introduction to pcs
pcs: Pacemaker cluster management tool
- The Pacemaker community has introduced two commonly used cluster management command line tools, namely the pcs and crmsh commands most commonly used by cluster administrators;
- In addition, it should be noted that the use of the pcs command line has certain requirements for the pacemaker and corosync software versions installed in the system, that is, Pacemaker 1.1.8 and above, Corosync 2.0 and above can use the pcs command line tool for cluster management
1.3 pcs commonly used management commands
cluster:配置集群选项和节点
status:查看当前集群资源和节点以及进程状态
resource:创建和管理集群资源
constraint:管理集群资源约束和限制
property:管理集群节点和资源属性
config:以用户可读格式显示完整集群配置信息
2. Installation and configuration
Experimental environment: 1. The firewall is turned off 2. selinux is Disabled
Two servers (to avoid password between them):
server1: 192.168.17.1
server4: 192.168.17.4
- Software warehouse configuration
vim /etc/yum.repos.d/westos.repo
[dvd]
name=rhel7.6 BaseOS
baseurl=http://192.168.17.1/rhel7.6/
gpgcheck=0
[HighAvailability]
name=rhel7.6
baseurl=http://192.168.17.1/rhel7.6/addons/HighAvailability
gpgcheck=0
- Install pacemaker and pcs and psmisc and policycoreutils-python
yum install -y pacemaker pcs psmisc policycoreutils-python
ssh server4 yum install -y pacemaker pcs psmisc policycoreutils-python
- Permanently open the service
systemctl enable --now pcsd.service
ssh server4 systemctl enable --now pcsd.service
- set password
echo westos | passwd --stdin hacluster
ssh server4 'echo westos | passwd --stdin hacluster'
- Configure cluster node authentication
pcs cluster auth server1 server4
- Create a two-node cluster
pcs cluster setup --name mycluster server1 server4
- Start the cluster
pcs cluster start --all
pcs cluster enable --all
- Disable STONITH component function
pcs property set stonith-enabled=false
若是不禁用
pcs status 查看状态时会有警告
crm_verify -LV 验证群集配置信息会有错误
- Create VIP
pcs resource create vip ocf:heartbeat:IPaddr2 ip=192.168.17.100 op monitor interval=30s
#参数都可以通过pcs resource create --help查看
3. Testing
- Check on the node server1, it shows that the node server1 is turned on, and the VIP is automatically added
- When the service of node server1 is stopped, another node server4 automatically takes over and automatically adds VIP
pcs cluster stop server1
- Restart node server1, resources will not be switched back, server4 remains unchanged and continues to take over
pcs cluster start server1
4. Pacemaker and HAProxy
4.1 Configuration
- Both server nodes have haproxy installed, and haproxy service is closed, the configuration is also the same
yum install -y haproxy
systemctl disable --now haproxy.service
- Create resources
pcs resource create haproxy systemd:haproxy op monitor interval=30s
- Create a group (make two resources managed by the same node)
pcs resource group add hagroup vip haproxy
:According to the order of commands, first manage vip, then haproxy
- Visit : http://172.25.17.100/status
4.2 Test 1: Turn off the pcs service of the node
- The node server4 is shut down, and it is automatically taken over by the idle node server1. After server4 restarts the service, it will not switch back.
pcs node standby
pcs node unstandby
4.3 Test 2: Close the haproxy service
- After turning off the haproxy service, the system will automatically turn it on again when it is detected. When the status is displayed, a log will show that haproxy has been closed.
4.4 Test 3: Delete VIP
- After deleting, it is detected that VIP is deleted, it will be added automatically
4.5 Test 4: Disable hardware devices
ip link set down 网卡接口
- It is found that the resource is automatically taken over to another idle node server4, but server1 still thinks it is taking over by itself, and it will be displayed after server1 restarts.
5. Fence (prevent cluster server from suspended animation)
Fence function : In the HA cluster environment, the backup server B sends data packets through the heartbeat line to see if the main server A is still alive, the main server A receives a large number of client access requests, and the CPU load of server A reaches 100% response However, when the resource has been exhausted and there is no way to reply to the data packet of server B, (the reply data packet will be delayed), server B thinks that server A is down, so the backup server B takes the resources and becomes the main server by itself. Server A responded for a while. Server A felt that he was the boss, and Server B felt that he was also the boss. The two of them were struggling to grab resources. The cluster resources were occupied by multiple nodes. The two servers simultaneously wrote data to the resources, which destroyed it. The security and consistency of resources, the occurrence of this situation is called "split brain". Server A is overloaded and cannot respond. With the Fence mechanism, Fence will automatically kill Server A to prevent the occurrence of "split brain".
Principle : When the host is abnormal or down due to unexpected reasons, the backup machine first calls the FENCE device, and then restarts the abnormal host or isolates it from the network through the FENCE device. When the FENCE operation is successfully executed, the information is returned to the backup machine, and the backup machine is connecting After FENCE's successful message, it began to take over the services and resources of the host. In this way, the FENCE device releases the resources occupied by the abnormal node, ensuring that resources and services are always running on one node.
Type : Hardware Fence: Power Fence, kick off the broken server software by turning off the power. Fence: Fence card (smart card), kick off the broken server by cable and software
5.1 Test host installation and configuration
fence host: 192.168.17.250
- Install fence-virtd, fence-virtd-libvirt, fence-virtd-multicast, network monitor, power manager, kernel level control virtual machine
yum install -y fence-virtd
yum install -y fence-virtd-libvirt
yum install -y fence-virtd-multicast
- Write fence information
fence_virtd -c
- Create a directory /etc/cluster, cut to its directory, and generate a key file
mkdir /etc/cluster
cd /etc/cluster
dd if=/dev/urandom of=fence_xvm.key bs=128 count=1
: Generate key file
- Restart the fence_virtd service, you can view port 1229
systemctl restart fence_virtd
- Transfer the key file to the monitored node (create the /etc/cluster directory in advance on all nodes)
scp fence_xvm.key [email protected]:/etc/cluster/
scp fence_xvm.key [email protected]:/etc/cluster/
5.2 Detected node configuration (cluster server)
- Install fence-virt on all cluster servers
yum install fence-virt.x86_64 -y
ssh server4 yum install fence-virt.x86_64 -y
- View fence agent
stonith_admin -I
- Add fence
pcs stonith create vmfence fence_xvm pcmk_host_map="server1:vm1;server4:vm4" op monitor interval=60s
选项在pcs stonith describe fence_xvm都可以查看到
#pcmk_host_map的值以键值对书写,如下:
#hostname:虚拟机名称
- Enable STONITH component function
pcs property set stonith-enabled=true
- Verify cluster configuration information
crm_verify -LV
5.3 Test 1: Damage the kernel
- Test on node server4: damage the kernel and find that server4 is automatically shut down and turned on again to realize the monitoring function!
echo c > /proc/sysrq-trigger
5.4 Test 2: Disable hardware devices
- Disable the server4 hardware device, and find that server4 is automatically shut down and turned on again to realize the monitoring function!
ip link set down eth0
6. lvs and nginx
- Close pcs
pcs cluster stop --all
pcs cluster disable --all
- Install source nginx
tar zxf nginx-1.18.0.tar.gz
cd nginx-1.18.0
yum install -y gcc pcre-devel openssl-devel
:安装gcc、pcre-devel、openssl-devel
vim auto/cc/gcc`
#CFLAGS="$CFLAGS -g"
#注释此行(127行)可以使安装后的二进制文件更小
./configure --prefix=/usr/local/nginx --with-http_ssl_module
: Configure script, specify the installation path and other parameters
make && make install
- Configure environment variables and start the service
cd /usr/local/nginx/sbin/
: Variable configuration of this directory
vim .bash_profile
: Writing variables
PATH=$PATH:$HOME/bin:/usr/local/nginx/sbin
source .bash_profile
: Re-read the file to make the variable effective
nginx
: Start service
- Configuration file
vim /usr/local/nginx/conf/nginx.conf
117 upstream westos {
118 server 172.25.17.2:80;
119 server 172.25.17.3:80;
120 }
121
122 server {
123 listen 80;
124 server_name demo.westos.org;
125 location / {
126 proxy_pass http://westos;
127 }
128 }
nginx -t
: Check for syntax errors
nginx -s reload
: Reread the configuration file
- Test : Visit after the test host configuration is resolved
vim /etc/hosts
192.168.17.1 demo.westos.org