ssdb cluster + keepalived to build actual combat-6. Master and standby exception repair
surroundings
Operating system: CentOS Linux release 7.6.1810 (Core)
ssdb: 1.9.7
keepalived: keepalived-2.0.16
IP:
master: 10.11.100.87
slave: 10.11.100.88
vip:10.11.100.89
Anomaly repair experiment
Due to network fluctuations and downtime, it often leads to database synchronization failure. Let's simulate the repair experiment of the main and standby synchronization exceptions.
1. Simulation abnormality
Close the slave library, write the data in the master library, and then start the slave library. Now look at the cluster status on the master library side:
ssdb 10.11.100.87:8888> info
version
1.9.7
links
1
total_calls
43
dbsize
141
binlogs
capacity : 20000000
min_seq : 0
max_seq : 13
replication
client 10.11.100.88:35708
type : sync
status : OUT_OF_SYNC
last_seq : 16
serv_key_range
kv : "" - ""
hash: "" - ""
zset: "" - ""
list: "" - ""
data_key_range
kv : "a" - "z"
hash: "" - ""
zset: "" - ""
list: "" - ""
leveldb.stats
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
2 1 0 0 0 0
19 result(s) (0.001 sec)
(0.001 sec)
Looking at the cluster status from the library side:
ssdb 10.11.100.88:8888> info
version
1.9.7
links
1
total_calls
5
dbsize
739
binlogs
capacity : 20000000
min_seq : 1
max_seq : 17
replication
slaveof 10.11.100.87:8888
id : svc_1
type : sync
status : OUT_OF_SYNC
last_seq : 16
copy_count : 0
sync_count : 0
serv_key_range
kv : "" - ""
hash: "" - ""
zset: "" - ""
list: "" - ""
data_key_range
kv : "a" - "d"
hash: "" - ""
zset: "" - ""
list: "" - ""
leveldb.stats
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
0 2 0 0 0 0
19 result(s) (0.002 sec)
(0.002 sec)
The display is all OUT_OF_SYNC, synchronization fails,
then write new data in the master library, the slave library will not synchronize, the cluster has been hung
2. Repair
In this case, you need to manually synchronize the data in the main database to the standby database:
Use the ssdb-dump command on the main library to back up:
[root@localhost ssdb]# ./ssdb-dump 10.11.100.87 8888 ./backup.ssdb
ssdb-dump - SSDB backup command
Copyright (c) 2012-2015 ssdb.io
recv begin...
received 1 entry(s)
received 5 entry(s)
recv end
total dumped 5 entry(s)
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
compacting data...
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
2 1 0 0 0 0
backup has been made to folder: ./backup.ssdb
Stop the ssdb service from the library:
[root@localhost ssdb]# ./ssdb-server ssdb.conf -s stop
ssdb-server 1.9.7
Copyright (c) 2012-2015 ssdb.io
Back up the var directory as var.bak, and create a new var directory:
[root@localhost ssdb]# mv var var.bak
[root@localhost ssdb]# mkdir var
Transfer backup.ssdb to the var directory of the slave library on the master library side:
[root@localhost ssdb]# scp -r ./backup.ssdb 10.11.100.88:/usr/local/ssdb/var
root@10.11.100.88's password:
LOG 100% 362 0.4KB/s 00:00
LOCK 100% 0 0.0KB/s 00:00
CURRENT 100% 16 0.0KB/s 00:00
MANIFEST-000002 100% 92 0.1KB/s 00:00
000004.log 100% 0 0.0KB/s 00:00
000005.ldb 100% 164 0.2KB/s 00:00
Start the service from the library:
[root@localhost ssdb]# ./ssdb-server -d ssdb.conf
ssdb-server 1.9.7
Copyright (c) 2012-2015 ssdb.io
Then check the cluster status:
[root@localhost ssdb]# ./ssdb-cli -h 10.11.100.87 -p 8888
ssdb (cli) - ssdb command line tool.
Copyright (c) 2012-2016 ssdb.io
'h' or 'help' for help, 'q' to quit.
ssdb-server 1.9.7
ssdb 10.11.100.87:8888> info
version
1.9.7
links
1
total_calls
45
dbsize
141
binlogs
capacity : 20000000
min_seq : 0
max_seq : 13
replication
client 10.11.100.88:35734
type : sync
status : SYNC
last_seq : 13
serv_key_range
kv : "" - ""
hash: "" - ""
zset: "" - ""
list: "" - ""
data_key_range
kv : "a" - "z"
hash: "" - ""
zset: "" - ""
list: "" - ""
leveldb.stats
Compactions
Level Files Size(MB) Time(sec) Read(MB) Write(MB)
--------------------------------------------------
2 1 0 0 0 0
19 result(s) (0.001 sec)
(0.001 sec)
Normal