SUSE Ceph add nodes to reduce the node, delete OSD disk and other operations - Storage6

 First, the description of the test environment

Before we have a good set of rapid deployment Ceph cluster (Node 3), is to be tested increase in existing node in the cluster online

  • As can be seen in the following table configuration increases the specific node node004

CPU name

Public Network

Network Management

Cluster network

Explanation

admin

192.168.2.39

172.200.50.39

---

Management Node

node001

192.168.2.40

172.200.50.40

192.168.3.40

MON, OSD

node002

192.168.2.41

172.200.50.41

192.168.3.41

MON, OSD

node003

192.168.2.42

172.200.50.42

192.168.3.42

MON, OSD

node004

192.168.2.43

172.200.50.43 

192.168.3.43

OSD

 

  • Test cluster architecture diagram

We can see architecture diagram adds node004 node, and just as OSD node004 node node, or MGR MON no service

 

Second, increase the cluster nodes node004

1. Collect cluster information
(1) cluster status

# ceph -s
  cluster:
    id:     f7b451b3-4a4c-4681-a4ef-4b5359242a92
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum node001,node002,node003 (age 90m)
    mgr: node001(active, since 89m), standbys: node002, node003
    osd: 6 osds: 6 up (since 90m), 6 in (since 23h)
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   12 GiB used, 48 GiB / 60 GiB avail
    pgs:  

(2)集群OSD磁盘信息

# ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF
-1       0.05878 root default                             
-5       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000
 5   hdd 0.00980         osd.5        up  1.00000 1.00000
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000
 3   hdd 0.00980         osd.3        up  1.00000 1.00000
-7       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 

(3)查看新增节点磁盘

# lsblk
NAME            MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda               8:0    0   20G  0 disk
├─sda1            8:1    0    1G  0 part /boot
└─sda2            8:2    0   19G  0 part
  ├─vgoo-lvswap 254:0    0    2G  0 lvm  [SWAP]
  └─vgoo-lvroot 254:1    0   17G  0 lvm  /
sdb               8:16   0   10G  0 disk
sdc               8:32   0   10G  0 disk
sr0              11:0    1 1024M  0 rom  
nvme0n1         259:0    0   20G  0 disk
nvme0n2         259:1    0   20G  0 disk
nvme0n3         259:2    0   20G  0 disk 

2、初始化操作系统
(1)初始化步骤
参考快速部署 storage6 文档

(2)显示仓库

# zypper lr
Repository priorities are without effect. All enabled repositories share the same priority.

#  | Alias                                              | Name                                               | Enabled | GPG Check | Refresh
---+----------------------------------------------------+----------------------------------------------------+---------+-----------+--------
 1 | SLE-Module-Basesystem-SLES15-SP1-Pool              | SLE-Module-Basesystem-SLES15-SP1-Pool              | Yes     | (r ) Yes  | No     
 2 | SLE-Module-Basesystem-SLES15-SP1-Upadates          | SLE-Module-Basesystem-SLES15-SP1-Upadates          | Yes     | (r ) Yes  | No     
 3 | SLE-Module-Legacy-SLES15-SP1-Pool                  | SLE-Module-Legacy-SLES15-SP1-Pool                  | Yes     | (r ) Yes  | No     
 4 | SLE-Module-Legacy-SLES15-SP1-Updates               | SLE-Module-Legacy-SLES15-SP1-Updates               | Yes     | ( p) Yes  | No     
 5 | SLE-Module-Server-Applications-SLES15-SP1-Pool     | SLE-Module-Server-Applications-SLES15-SP1-Pool     | Yes     | (r ) Yes  | No     
 6 | SLE-Module-Server-Applications-SLES15-SP1-Upadates | SLE-Module-Server-Applications-SLES15-SP1-Upadates | Yes     | (r ) Yes  | No     
 7 | SLE-Product-SLES15-SP1-Pool                        | SLE-Product-SLES15-SP1-Pool                        | Yes     | (r ) Yes  | No     
 8 | SLE-Product-SLES15-SP1-Updates                     | SLE-Product-SLES15-SP1-Updates                     | Yes     | (r ) Yes  | No     
 9 | SUSE-Enterprise-Storage-6-Pool                     | SUSE-Enterprise-Storage-6-Pool                     | Yes     | (r ) Yes  | No     
10 | SUSE-Enterprise-Storage-6-Updates                  | SUSE-Enterprise-Storage-6-Updates                  | Yes     | (r ) Yes  | No   

(3)hosts文件

192.168.2.39    admin.example.com admin
192.168.2.40    node001.example.com node001
192.168.2.41    node002.example.com node002
192.168.2.42    node003.example.com node003
192.168.2.43    node004.example.com node004
192.168.2.44    node005.example.com node005

3、安装 satlt-minion

  • node004节点
zypper -n in salt-minion
sed -i '17i\master: 192.168.2.39' /etc/salt/minion
systemctl restart salt-minion.service
systemctl enable salt-minion.service
systemctl status salt-minion.service
  • admin节点
# salt-key
Accepted Keys:
admin.example.com
node001.example.com
node002.example.com
node003.example.com
Denied Keys:
Unaccepted Keys:
node004.example.com   <==== 新加节点
Rejected Keys:
  • 接受key
# salt-key -A
  • 测试node004
# salt "node004*" test.ping
node004.example.com:
    True

4、预防集群数据平衡

  • 以前增加节点的时候,一直使用norebalance方式来预防数据平衡,这种方式比较简单粗暴,(不建议使用)
# ceph osd set norebalance
norebalance is set
admin:/etc/salt/pki/master # ceph -s
  cluster:
    id:     f7b451b3-4a4c-4681-a4ef-4b5359242a92
    health: HEALTH_WARN
            norebalance flag(s) set
  services:
    mon: 3 daemons, quorum node001,node002,node003 (age 2h)
    mgr: node001(active, since 2h), standbys: node002, node003
    osd: 6 osds: 6 up (since 2h), 6 in (since 24h)
         flags norebalance

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   12 GiB used, 48 GiB / 60 GiB avail
    pgs:  
  • 建议使用 “osd_crush_initial_weight”参数,结合salt工具批量执行

(1)新建 global.conf 文件(admin节点)

# vim /srv/salt/ceph/configuration/files/ceph.conf.d/global.conf
osd_crush_initial_weight = 0

(2)创建新配置文件(admin节点)

# salt '*' state.apply ceph.configuration.create

注意:执行时有报错,可忽略

node003.example.com:
  Name: /var/cache/salt/minion/files/base/ceph/configuration - Function: file.absent -
  Result: Changed Started: - 15:42:45.362265 Duration: 22.133 ms
----------
          ID: /srv/salt/ceph/configuration/cache/ceph.conf
    Function: file.managed
      Result: False
     Comment: Unable to manage file: Jinja error: 'select.minions'
              Traceback (most recent call last):
                File "/usr/lib/python3.6/site-pack

(3)执行新配置文件,并且只在node001 node002 node003节点上生效 (admin节点)

# salt 'node00[1-3]*' state.apply ceph.configuration

(4)检查个节点配置文件 (node001,node002,node003)

# cat /etc/ceph/ceph.conf
osd crush initial weight = 0

5、执行stage0 1 2 (admin节点)

# salt-run state.orch ceph.stage.0
# salt-run state.orch ceph.stage.1
# salt-run state.orch ceph.stage.2
# salt 'node004*' pillar.items           # 查看pillar设置是否正确
    public_network:
        192.168.2.0/24
    roles:
        - storage                    # 仅仅是 storage 角色
    time_server:
        admin.example.com

6、检查生成 OSD 报告(admin节点)

# salt-run disks.report 
  node004.example.com:
      |_
        - 0
        -
          Total OSDs: 2
          
          Solid State VG:
            Targets:   block.db                  Total size: 19.00 GB                 
            Total LVs: 2                         Size per LV: 1.86 GB                  
            Devices:   /dev/nvme0n2
          
            Type            Path                         LV Size         % of device
          -------------------------------------------------------------------------
            [data]          /dev/sdb                     9.00 GB         100.0%
            [block.db]      vg: vg/lv                    1.86 GB         10%
          -------------------------------------------------------------------------
            [data]          /dev/sdc                     9.00 GB         100.0%
            [block.db]      vg: vg/lv                    1.86 GB         10%

7、运行stage3, 把node004节点添加进来,并自动创建OSD (admin节点)

# salt-run state.orch ceph.stage.3

8、执行后检查集群OSD状态 (admin节点)

可以发现新增节点权重都是0,这是由于之前配置的“osd_crush_initial_weight”参数,预防新增节点或磁盘进来时进行数据平衡。

# ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF
-1       0.05878 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000
 5   hdd 0.00980         osd.5        up  1.00000 1.00000
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000
 3   hdd 0.00980         osd.3        up  1.00000 1.00000
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000
 4   hdd 0.00980         osd.4        up  1.00000 1.00000
-9             0     host node004                         
 6   hdd       0         osd.6        up  1.00000 1.00000  <=== 新增节点OSD权重为0
 7   hdd       0         osd.7        up  1.00000 1.00000 

9、手动增加OSD磁盘权重 (admin节点)
注意:生产环境请在变更时间执行,执行时会数据平衡,影响读写。当然也可以通过Ceph参数或QOS来控制读写速率,后续文档中会提到。

# ceph osd crush reweight osd.6 0.00980
# ceph osd crush reweight osd.7 0.00980

 

# ceph osd tree        
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.07837 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.01959     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 
 7   hdd 0.00980         osd.7        up  1.00000 1.00000 

三、增加 OSD 磁盘操作

1、首先我们通过 VMware workstation 的虚拟机 node004 节点上添加一块10G大小的磁盘

 

 2、开启虚拟机后,node004主机终端中查看新增磁盘

# lsblk                                
NAME                                    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                                       8:0    0   20G  0 disk 
├─sda1                                    8:1    0    1G  0 part /boot
└─sda2                                    8:2    0   19G  0 part 
  ├─vgoo-lvroot                         254:0    0   17G  0 lvm  /
  └─vgoo-lvswap                         254:1    0    2G  0 lvm  [SWAP]
sdb                                       8:16   0   10G  0 disk 
└─ceph--block--0515f9d7--3407--46a5--   254:4    0    9G  0 lvm  
sdc                                       8:32   0   10G  0 disk 
└─ceph--block--9f7394b2--3ad3--4cd8--   254:5    0    9G  0 lvm  
sdd                                       8:48   0   10G  0 disk  <== 新增磁盘
sr0                                      11:0    1 1024M  0 rom  
nvme0n1                                 259:0    0   20G  0 disk 
nvme0n2                                 259:1    0   20G  0 disk 
├─ceph--block--dbs--57d07a01--4440--4   254:2    0    1G  0 lvm  
└─ceph--block--dbs--57d07a01--4440--4   254:3    0    1G  0 lvm  
nvme0n3                                 259:2    0   20G  0 disk 

3、查看 VG LV 信息

# lvs
  LV                                                VG                                                  Attr       LSize  
  osd-block-9a914f7d-ae9c-451a-ac7e-bcb6cb1fc926    ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc     -wi-ao----  9.00g                                                    
  osd-block-79f5920f-b41c-4dd0-94e9-dc85dbb2e7e4    ceph-block-9f7394b2-3ad3-4cd8-8267-7e5993af1271     -wi-ao----  9.00g                                                    
  osd-block-db-2244293e-ca96-4847-a5cb-9112f59836fa ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao----  1.00g                                                    
  osd-block-db-2b295cc9-caff-45ad-a179-d7e3ba46a39d ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao----  1.00g                                                    
  osd-block-db-test                                 ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-a-----  2.00g                                                    
  lvroot                                            vgoo                                                -wi-ao---- 17.00g                                                    
  lvswap                                            vgoo                                                -wi-ao----  2.00g 

4、创建 OSD 磁盘的 VG 和 LV

# vgcreate ceph-block-0 /dev/sdd
# lvcreate -l 100%FREE -n block-0 ceph-block-0

5、在已在nvme0n2磁盘上创建的 VG 上创建 LV

我们从第3个步骤中可以看到,nvme0n2磁盘已经被 VG ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586所使用,我们要在该VG上创建LV,因为一般一块PCIE SSD磁盘可以承担10块OSD数据磁盘,作为他们WAL和DB加速磁盘使用。

# lvcreate -L 2GB -n db-0 ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586

6、显示 VG LV 信息

# lvs
  LV                                                VG                                                  Attr       LSize  
  block-0                                           ceph-block-0                                        -wi-a----- 10.00g                                                    
  osd-block-9a914f7d-ae9c-451a-ac7e-bcb6cb1fc926    ceph-block-0515f9d7-3407-46a5-be68-db80fc789dcc     -wi-ao----  9.00g                                                    
  osd-block-79f5920f-b41c-4dd0-94e9-dc85dbb2e7e4    ceph-block-9f7394b2-3ad3-4cd8-8267-7e5993af1271     -wi-ao----  9.00g   
  db-0                                              ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-a-----  2.00g  
  osd-block-db-2244293e-ca96-4847-a5cb-9112f59836fa ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao----  1.00g                                                    
  osd-block-db-2b295cc9-caff-45ad-a179-d7e3ba46a39d ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586 -wi-ao----  1.00g                                                    
  lvroot                                            vgoo                                                -wi-ao---- 17.00g                                                    
  lvswap                                            vgoo                                                -wi-ao----  2.00g 

7、使用 ceph-volume 方式创建

  •  这次我们通过另一种方式来创建,而不是使用 drive group 方式,因为你需要了解到一旦自动化工具出现问题时如何处理和创建OSD
# ceph-volume lvm create --bluestore --data ceph-block-0/block-0 --block.db ceph-block-dbs-57d07a01-4440-4892-b44c-eae536613586/db-0
  • 管理节点上,查看OSD输出信息
# ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.07837 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.01959     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 
 7   hdd 0.00980         osd.7        up  1.00000 1.00000 
 8   hdd       0         osd.8        up  1.00000 1.00000   <==== 可以看到 osd.8 被创建出来

8、设置权重

注意:生产环境操作时,会数据平衡会影响到读写性能。

# ceph osd crush reweight osd.8 0.00980

 

# ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.08817 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.02939     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 
 7   hdd 0.00980         osd.7        up  1.00000 1.00000 
 8   hdd 0.00980         osd.8        up  1.00000 1.00000 

 四、删除OSD磁盘

 语法: salt-run osd.remove OSD_ID

 1、批量删除 node004 节点上 OSD.7 OSD.8

admin:~ # ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.08817 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.02939     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 
 7   hdd 0.00980         osd.7        up  1.00000 1.00000 
 8   hdd 0.00980         osd.8        up  1.00000 1.00000 

 

admin:~ # salt-run osd.remove 7 8
Removing osd 7 on host node004.example.com
Draining the OSD
Waiting for ceph to catch up.
osd.7 is safe to destroy
Purging from the crushmap
Zapping the device

Removing osd 8 on host node004.example.com
Draining the OSD
Waiting for ceph to catch up.
osd.8 is safe to destroy
Purging from the crushmap
Zapping the device

2、显示 osd 信息,node004主机上 osd.7 和 osd.8 已被删除

admin:~ # ceph osd tree
ID CLASS WEIGHT  TYPE NAME        STATUS REWEIGHT PRI-AFF 
-1       0.06857 root default                             
-7       0.01959     host node001                         
 2   hdd 0.00980         osd.2        up  1.00000 1.00000 
 5   hdd 0.00980         osd.5        up  1.00000 1.00000 
-3       0.01959     host node002                         
 0   hdd 0.00980         osd.0        up  1.00000 1.00000 
 3   hdd 0.00980         osd.3        up  1.00000 1.00000 
-5       0.01959     host node003                         
 1   hdd 0.00980         osd.1        up  1.00000 1.00000 
 4   hdd 0.00980         osd.4        up  1.00000 1.00000 
-9       0.00980     host node004                         
 6   hdd 0.00980         osd.6        up  1.00000 1.00000 

 3、删除 OSD 其他命令
(1)删除节点上所有osd

# salt-run osd.remove OSD_HOST_NAME

(2)当 WAL 或 DB 设备损坏时,移除破损的磁盘

# salt-run osd.remove OSD_ID force=True

四、减少集群节点

从集群中移出node004 osd节点,移除前请确保集群有足够的空间容纳node004上的数据
1、手动方式


2、DeepSea 方式
(1)管理节点上修改 policy.cfg 文件

vim /srv/pillar/ceph/proposals/policy.cfg
## Cluster Assignment
#cluster-ceph/cluster/*.sls            <=== 注释掉
cluster-ceph/cluster/node00[1-3]*.sls  <=== 匹配 target
cluster-ceph/cluster/admin*.sls        <=== 匹配 target

## Roles
# ADMIN  
role-master/cluster/admin*.sls
role-admin/cluster/admin*.sls

# Monitoring
role-prometheus/cluster/admin*.sls
role-grafana/cluster/admin*.sls

# MON
role-mon/cluster/node00[1-3]*.sls

# MGR (mgrs are usually colocated with mons)
role-mgr/cluster/node00[1-3]*.sls

# COMMON
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml

# Storage   # 定义为 storage 角色
#role-storage/cluster/node00*.sls      <=== 注释掉
role-storage/cluster/node00[1-3]*.sls  <=== 匹配 target

(2)修改 drive_group.yml 文件

# vim /srv/salt/ceph/configuration/files/drive_groups.yml
# This is the default configuration and
# will create an OSD on all available drives
drive_group_hdd_nvme:     
  target: 'node00[1-3]*'    <== 匹配 target
  data_devices:
    size: '9GB:12GB'              
  db_devices:
    rotational: 0      
    limit: 1    
  block_db_size: '2G'

(3)执行salt命令,stage2和stage5

# salt-run state.orch ceph.stage.2
# salt-run state.orch ceph.stage.5

 

 


 

Guess you like

Origin www.cnblogs.com/alfiesuse/p/11562336.html