table of Contents
Brief introduction
First it should be noted that, ceph the osd is not recommended to make raid10 or raid5 generally recommended to run a single disk. In our environment, in order to fully utilize the cache raid card, even a single disc, we will hang raid0 made in raid card.
Such is the inevitable problem of disk damage, you need to do some action ceph removed them, but also need to rebuild the raid.
After completion of the replacement disk rebuild raid, you need to add osd. After the new osd added to the cluster, ceph will automatically perform data recovery and backfill process. We also need to control the speed of recovery by adjusting the parameters recovery and backfill data.
The following is a detailed explanation.
Replacing OSD Procedure
1. locate the failed disk
In general, hardware monitoring, we can sense a disk failure. But the failed disk corresponds to the system in which one did not know the letter of law.
We can be confirmed by checking dmesg log:
[4814427.336053] print_req_error: 5 callbacks suppressed[]
[4814427.336055] print_req_error: I/O error, dev sdi, sector 0
[4814427.337422] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814427.337432] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814427.337434] print_req_error: I/O error, dev sdi, sector 0
[4814427.338901] buffer_io_error: 4 callbacks suppressed
[4814427.338904] Buffer I/O error on dev sdi, logical block 0, async page read
[4814749.780689] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.780694] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.780697] print_req_error: I/O error, dev sdi, sector 0
[4814749.781903] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.781905] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.781906] print_req_error: I/O error, dev sdi, sector 0
[4814749.783105] Buffer I/O error on dev sdi, logical block 0, async page read
In our logs which can be seen, the failed disk is / dev / sdi
Next, we need to verify / dev / sdi corresponding OSD, the L ceph version, the default of bluestore, mounted into the following manner:
root@ctnr:~# df -hT
Filesystem Type Size Used Avail Use% Mounted on
...
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-2
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-3
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-5
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-6
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-7
tmpfs tmpfs 63G 48K 63G 1% /var/lib/ceph/osd/ceph-8
There is no way to view the osd corresponding to a disk directly in this way.
View corresponding lvm each disk by the following operations:
root@ctnr:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdf 8:80 0 1.8T 0 disk
└─ceph--295361e9--45ed--4f85--be6a--a3eb06ba8341-osd--block--e2e485b7--65c0--49ad--a37c--24eaefbc3343 253:4 0 1.8T 0 lvm
sdd 8:48 0 1.8T 0 disk
└─ceph--20b494d7--bcd0--4f60--bee0--900edd843b26-osd--block--620cf64c--e76a--44d4--b308--87a0e78970cb 253:2 0 1.8T 0 lvm
sdb 8:16 0 1.8T 0 disk
└─ceph--1c9e3474--e080--478c--aa50--d9e2cc9900e1-osd--block--33dccd23--a7c4--416d--8a22--1787f98c243f 253:0 0 1.8T 0 lvm
sdk 8:160 0 476.4G 0 disk
└─ceph--a3f4913b--d3e1--4c51--9d4d--87340e1d4271-osd--block--f9d7958b--8a66--41e4--8964--8e5cb95e6d09 253:9 0 476.4G 0 lvm
sdg 8:96 0 1.8T 0 disk
└─ceph--36092d1e--4e85--49a1--8378--14b432d1c3d0-osd--block--9da0cba0--0a12--4e32--bed6--438f4db71e69 253:5 0 1.8T 0 lvm
sde 8:64 0 1.8T 0 disk
└─ceph--a21e1b26--0c40--4a36--b6ad--39a2b9920fe7-osd--block--b55e0ccd--cd1e--4067--9299--bb709e64765b 253:3 0 1.8T 0 lvm
sdc 8:32 0 1.8T 0 disk
└─ceph--5ac4fc0f--e517--4a0b--ba50--586707f582b4-osd--block--ab1cb37e--6612--4d18--a045--c2375af9012c 253:1 0 1.8T 0 lvm
sda 8:0 0 3.7T 0 disk
├─sda2 8:2 0 279.4G 0 part /
├─sda3 8:3 0 3.4T 0 part /home
└─sda1 8:1 0 1M 0 part
sdj 8:144 0 476.4G 0 disk
└─ceph--9c93296c--ff24--4ed7--8227--eae40dda38fc-osd--block--5ea3c735--3770--4b42--87aa--12bbe9885bdb 253:8 0 476.4G 0 lvm
And then view all osd lvm corresponding operation by:
root@ctnr:~# ll /var/lib/ceph/osd/ceph-*/block
lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-10/block -> /dev/ceph-a3f4913b-d3e1-4c51-9d4d-87340e1d4271/osd-block-f9d7958b-8a66-41e4-8964-8e5cb95e6d09
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:18 /var/lib/ceph/osd/ceph-2/block -> /dev/ceph-1c9e3474-e080-478c-aa50-d9e2cc9900e1/osd-block-33dccd23-a7c4-416d-8a22-1787f98c243f
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-3/block -> /dev/ceph-5ac4fc0f-e517-4a0b-ba50-586707f582b4/osd-block-ab1cb37e-6612-4d18-a045-c2375af9012c
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-5/block -> /dev/ceph-20b494d7-bcd0-4f60-bee0-900edd843b26/osd-block-620cf64c-e76a-44d4-b308-87a0e78970cb
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-6/block -> /dev/ceph-a21e1b26-0c40-4a36-b6ad-39a2b9920fe7/osd-block-b55e0ccd-cd1e-4067-9299-bb709e64765b
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-7/block -> /dev/ceph-295361e9-45ed-4f85-be6a-a3eb06ba8341/osd-block-e2e485b7-65c0-49ad-a37c-24eaefbc3343
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:21 /var/lib/ceph/osd/ceph-8/block -> /dev/ceph-36092d1e-4e85-49a1-8378-14b432d1c3d0/osd-block-9da0cba0-0a12-4e32-bed6-438f4db71e69
lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-9/block -> /dev/ceph-9c93296c-ff24-4ed7-8227-eae40dda38fc/osd-block-5ea3c735-3770-4b42-87aa-12bbe9885bdb
By contrast the lvm name, corresponding to a failed disk osd
2. The removal of the failed disk
After confirming the failed disk and its corresponding osd by the above method, we need to perform the corresponding operations removed:
- Delete the osd from the ceph
# 在monitor上操作
ceph osd out osd.9
# 在相应的节点机上停止服务
ceph stop ceph-osd@9
# 在monitory上操作
ceph osd crush remove osd.9
ceph auth del osd.9
ceph osd rm osd.9
- Uninstall disk
umount /var/lib/ceph/osd/ceph-9
3. Rebuild raid0
Reconstruction raid rely mega kit, the following are examples installed ubuntu:
wget -O - http://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -
echo "deb http://hwraid.le-vert.net/ubuntu precise main" >> /etc/apt/sources.list
apt-get update
apt-get install megacli megactl megaraid-status
View raid status:
megacli -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry|Firmware'
Adapter #0
...
Enclosure Device ID: 32
Slot Number: 9
Enclosure position: 1
Firmware state: Online, Spun Up
Device Firmware Level: GS0F
Inquiry Data: SEAGATE ST2000NM0023 GS0FZ1X2Q5P6
Enclosure Device ID: 32
Slot Number: 10
Enclosure position: 1
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: 004C
Inquiry Data: PHLA914001Y6512DGN INTEL SSDSC2KW512G8 LHF004C
Instructions:
- Adapter: representatives raid controller number
- Enclosure Device ID: id disk cartridge
- Slot Number: Slot Number
- Firmware state: the state of the firmware.
Online, SpunUP
Represents the normal state,Unconfigured(good), Spun Up
representatives of non-normal condition
We need to rebuild the raid on the disk is not the normal state:
# 对硬盘盒id为32、插槽号为10的硬盘做raid0
root@ctnr:~# megacli -CfgLdAdd -r0'[32:10]' -a0
Adapter 0: Created VD 7
Adapter 0: Configured the Adapter!!
By this time again fdisk -l
you can see the newly added disk
fdisk -l
...
Disk /dev/sdj: 476.4 GiB, 511503761408 bytes, 999030784 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
4. reconstruction osd
ceph-deploy disk list ctnr.a1-56-14.pub.unp
ceph-deploy disk zap ctnr.a1-56-14.pub.unp /dev/sdj
ceph-deploy osd create --data /dev/sdj ctnr.a1-56-14.pub.unp
Data recovery and backfill speed control
# 将用于数据恢复操作的优先级提到最高级别
ceph tell osd.* injectargs "--osd_recovery_op_priority=63"
# 将用于client I/O操作的优先级降到3
ceph tell osd.* injectargs "--osd_client_op_priority=3"
# 将每个osd上用于回填并发操作数由默认的1调整到50
ceph tell osd.* injectargs "--osd_max_backfills=50"
# 将每个osd上用于恢复的并发操作数由默认的3调整到50
ceph tell osd.* injectargs "--osd_recovery_max_active=50"
# 将每个osd上用于执行恢复的线程数由默认的1调整到10
ceph tell osd.* injectargs "--osd_recovery_threads=10"
Note: All the above operations are designed to recover data as quickly as possible, after the data recovery is complete, need to adjust to come back, if you still need to give priority to ensuring the quality of client service in the recovery process, you can not do related adjustments, you can keep the default value