Ceph OSD replacement disk

Brief introduction

First it should be noted that, ceph the osd is not recommended to make raid10 or raid5 generally recommended to run a single disk. In our environment, in order to fully utilize the cache raid card, even a single disc, we will hang raid0 made in raid card.

Such is the inevitable problem of disk damage, you need to do some action ceph removed them, but also need to rebuild the raid.

After completion of the replacement disk rebuild raid, you need to add osd. After the new osd added to the cluster, ceph will automatically perform data recovery and backfill process. We also need to control the speed of recovery by adjusting the parameters recovery and backfill data.

The following is a detailed explanation.

Replacing OSD Procedure

1. locate the failed disk

In general, hardware monitoring, we can sense a disk failure. But the failed disk corresponds to the system in which one did not know the letter of law.

We can be confirmed by checking dmesg log:

[4814427.336053] print_req_error: 5 callbacks suppressed[]
[4814427.336055] print_req_error: I/O error, dev sdi, sector 0
[4814427.337422] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814427.337432] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814427.337434] print_req_error: I/O error, dev sdi, sector 0
[4814427.338901] buffer_io_error: 4 callbacks suppressed
[4814427.338904] Buffer I/O error on dev sdi, logical block 0, async page read
[4814749.780689] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.780694] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.780697] print_req_error: I/O error, dev sdi, sector 0
[4814749.781903] sd 0:2:5:0: [sdi] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[4814749.781905] sd 0:2:5:0: [sdi] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00
[4814749.781906] print_req_error: I/O error, dev sdi, sector 0
[4814749.783105] Buffer I/O error on dev sdi, logical block 0, async page read

In our logs which can be seen, the failed disk is / dev / sdi

Next, we need to verify / dev / sdi corresponding OSD, the L ceph version, the default of bluestore, mounted into the following manner:

root@ctnr:~# df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
...
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-2
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-3
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-5
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-6
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-7
tmpfs          tmpfs      63G   48K   63G   1% /var/lib/ceph/osd/ceph-8

There is no way to view the osd corresponding to a disk directly in this way.

View corresponding lvm each disk by the following operations:

root@ctnr:~# lsblk 
NAME                                                                                                  MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sdf                                                                                                     8:80   0   1.8T  0 disk 
└─ceph--295361e9--45ed--4f85--be6a--a3eb06ba8341-osd--block--e2e485b7--65c0--49ad--a37c--24eaefbc3343 253:4    0   1.8T  0 lvm  
sdd                                                                                                     8:48   0   1.8T  0 disk 
└─ceph--20b494d7--bcd0--4f60--bee0--900edd843b26-osd--block--620cf64c--e76a--44d4--b308--87a0e78970cb 253:2    0   1.8T  0 lvm  
sdb                                                                                                     8:16   0   1.8T  0 disk 
└─ceph--1c9e3474--e080--478c--aa50--d9e2cc9900e1-osd--block--33dccd23--a7c4--416d--8a22--1787f98c243f 253:0    0   1.8T  0 lvm  
sdk                                                                                                     8:160  0 476.4G  0 disk 
└─ceph--a3f4913b--d3e1--4c51--9d4d--87340e1d4271-osd--block--f9d7958b--8a66--41e4--8964--8e5cb95e6d09 253:9    0 476.4G  0 lvm  
sdg                                                                                                     8:96   0   1.8T  0 disk 
└─ceph--36092d1e--4e85--49a1--8378--14b432d1c3d0-osd--block--9da0cba0--0a12--4e32--bed6--438f4db71e69 253:5    0   1.8T  0 lvm  
sde                                                                                                     8:64   0   1.8T  0 disk 
└─ceph--a21e1b26--0c40--4a36--b6ad--39a2b9920fe7-osd--block--b55e0ccd--cd1e--4067--9299--bb709e64765b 253:3    0   1.8T  0 lvm  
sdc                                                                                                     8:32   0   1.8T  0 disk 
└─ceph--5ac4fc0f--e517--4a0b--ba50--586707f582b4-osd--block--ab1cb37e--6612--4d18--a045--c2375af9012c 253:1    0   1.8T  0 lvm  
sda                                                                                                     8:0    0   3.7T  0 disk 
├─sda2                                                                                                  8:2    0 279.4G  0 part /
├─sda3                                                                                                  8:3    0   3.4T  0 part /home
└─sda1                                                                                                  8:1    0     1M  0 part 
sdj                                                                                                     8:144  0 476.4G  0 disk 
└─ceph--9c93296c--ff24--4ed7--8227--eae40dda38fc-osd--block--5ea3c735--3770--4b42--87aa--12bbe9885bdb 253:8    0 476.4G  0 lvm 

And then view all osd lvm corresponding operation by:

root@ctnr:~# ll /var/lib/ceph/osd/ceph-*/block
lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-10/block -> /dev/ceph-a3f4913b-d3e1-4c51-9d4d-87340e1d4271/osd-block-f9d7958b-8a66-41e4-8964-8e5cb95e6d09
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:18 /var/lib/ceph/osd/ceph-2/block -> /dev/ceph-1c9e3474-e080-478c-aa50-d9e2cc9900e1/osd-block-33dccd23-a7c4-416d-8a22-1787f98c243f
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-3/block -> /dev/ceph-5ac4fc0f-e517-4a0b-ba50-586707f582b4/osd-block-ab1cb37e-6612-4d18-a045-c2375af9012c
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:19 /var/lib/ceph/osd/ceph-5/block -> /dev/ceph-20b494d7-bcd0-4f60-bee0-900edd843b26/osd-block-620cf64c-e76a-44d4-b308-87a0e78970cb
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-6/block -> /dev/ceph-a21e1b26-0c40-4a36-b6ad-39a2b9920fe7/osd-block-b55e0ccd-cd1e-4067-9299-bb709e64765b
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:20 /var/lib/ceph/osd/ceph-7/block -> /dev/ceph-295361e9-45ed-4f85-be6a-a3eb06ba8341/osd-block-e2e485b7-65c0-49ad-a37c-24eaefbc3343
lrwxrwxrwx 1 ceph ceph 93 Mar 18 18:21 /var/lib/ceph/osd/ceph-8/block -> /dev/ceph-36092d1e-4e85-49a1-8378-14b432d1c3d0/osd-block-9da0cba0-0a12-4e32-bed6-438f4db71e69
lrwxrwxrwx 1 ceph ceph 93 Jun 18 18:49 /var/lib/ceph/osd/ceph-9/block -> /dev/ceph-9c93296c-ff24-4ed7-8227-eae40dda38fc/osd-block-5ea3c735-3770-4b42-87aa-12bbe9885bdb

By contrast the lvm name, corresponding to a failed disk osd

2. The removal of the failed disk

After confirming the failed disk and its corresponding osd by the above method, we need to perform the corresponding operations removed:

  1. Delete the osd from the ceph
# 在monitor上操作
ceph osd out osd.9
# 在相应的节点机上停止服务
ceph stop ceph-osd@9
# 在monitory上操作
ceph osd crush remove osd.9
ceph auth del osd.9
ceph osd rm osd.9
  1. Uninstall disk
umount /var/lib/ceph/osd/ceph-9

3. Rebuild raid0

Reconstruction raid rely mega kit, the following are examples installed ubuntu:

wget -O - http://hwraid.le-vert.net/debian/hwraid.le-vert.net.gpg.key | sudo apt-key add -
echo "deb http://hwraid.le-vert.net/ubuntu precise main" >> /etc/apt/sources.list
apt-get update
apt-get install megacli megactl megaraid-status

View raid status:

megacli -PDList -aALL | egrep 'Adapter|Enclosure|Slot|Inquiry|Firmware'

Adapter #0
...
Enclosure Device ID: 32
Slot Number: 9
Enclosure position: 1
Firmware state: Online, Spun Up
Device Firmware Level: GS0F
Inquiry Data: SEAGATE ST2000NM0023    GS0FZ1X2Q5P6     

Enclosure Device ID: 32
Slot Number: 10
Enclosure position: 1
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: 004C
Inquiry Data: PHLA914001Y6512DGN  INTEL SSDSC2KW512G8                      LHF004C

Instructions:

  • Adapter: representatives raid controller number
  • Enclosure Device ID: id disk cartridge
  • Slot Number: Slot Number
  • Firmware state: the state of the firmware. Online, SpunUPRepresents the normal state, Unconfigured(good), Spun Uprepresentatives of non-normal condition

We need to rebuild the raid on the disk is not the normal state:

# 对硬盘盒id为32、插槽号为10的硬盘做raid0

root@ctnr:~# megacli  -CfgLdAdd -r0'[32:10]' -a0 
                                     
Adapter 0: Created VD 7

Adapter 0: Configured the Adapter!!

By this time again fdisk -lyou can see the newly added disk

fdisk -l 

...
Disk /dev/sdj: 476.4 GiB, 511503761408 bytes, 999030784 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

4. reconstruction osd

ceph-deploy disk list ctnr.a1-56-14.pub.unp
ceph-deploy disk zap ctnr.a1-56-14.pub.unp /dev/sdj
ceph-deploy osd create --data /dev/sdj ctnr.a1-56-14.pub.unp

Data recovery and backfill speed control

# 将用于数据恢复操作的优先级提到最高级别
ceph tell osd.* injectargs "--osd_recovery_op_priority=63"

# 将用于client I/O操作的优先级降到3
ceph tell osd.* injectargs "--osd_client_op_priority=3"

# 将每个osd上用于回填并发操作数由默认的1调整到50
ceph tell osd.* injectargs "--osd_max_backfills=50"

# 将每个osd上用于恢复的并发操作数由默认的3调整到50
ceph tell osd.* injectargs "--osd_recovery_max_active=50"

# 将每个osd上用于执行恢复的线程数由默认的1调整到10
ceph tell osd.* injectargs "--osd_recovery_threads=10"

Note: All the above operations are designed to recover data as quickly as possible, after the data recovery is complete, need to adjust to come back, if you still need to give priority to ensuring the quality of client service in the recovery process, you can not do related adjustments, you can keep the default value

Guess you like

Origin www.cnblogs.com/breezey/p/11080534.html