Operation and maintenance common instructions in ceph cluster

Cluster operations
start the cluster

# 启动 mon 服务
sudo service ceph-mon@ceph1 start
# 启动 mgr 服务 
sudo service ceph-mgr@ceph1 start
# 启动指定的 OSD 服务
sudo service ceph-osd@0 start 
# 启动所有的 OSD 服务
sudo service ceph-osd@* start
# 启动 MDS 服务
sudo service ceph-mds@ceph1 start

View the real-time running status of ceph

ceph -w
  cluster:
    id:     46634c97-5c3c-424b-b2d9-653a15849c61
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum node1 (age 41m)
    mgr: node1(active, since 41m)
    osd: 3 osds: 3 up (since 41m), 3 in (since 3w)
 
  data:
    pools:   1 pools, 128 pgs
    objects: 94 objects, 275 MiB
    usage:   3.8 GiB used, 14 GiB / 18 GiB avail
    pgs:     128 active+clean

Unload all ceph packets from a node

ceph-deploy purge node1 # 删除所有软件和数据
ceph-deploy purgedata node1 # 只删除数据

View ceph storage space

ceph df
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED        RAW USED     %RAW USED 
    hdd       18 GiB     14 GiB     824 MiB      3.8 GiB         21.15 
    TOTAL     18 GiB     14 GiB     824 MiB      3.8 GiB         21.15 
 
POOLS:
    POOL     ID     PGS     STORED      OBJECTS     USED        %USED     MAX AVAIL 
    rbd       1     128     260 MiB          94     787 MiB      5.47       4.4 GiB

Create an admin user for ceph and create a key for the admin user, save the key to the /etc/ceph directory

ceph auth get-or-create client.admin mds 'allow *' osd 'allow *' mon 'allow *' mgr 'allow *' \
-o /etc/ceph/ceph.client.admin.keyring

Create a user for osd.0 and create a key

ceph auth get-or-create mds.ceph1 mon 'allow profile mds' osd 'allow rwx' mds 'allow *' \
		 -o /var/lib/ceph/mds/ceph-node1/keyring

View the authenticated users and related keys in the ceph cluster

ceph auth list

View the detailed configuration of the cluster

ceph daemon mon.ceph1 config show | more

View cluster health status details

ceph health detail
HEALTH_OK # 如果有故障或者警告的话,这里会输出很多。

MON operation
view MON status information

ceph mon stat
e1: 1 mons at {
    
    node1=[v2:10.0.0.131:3300/0,v1:10.0.0.131:6789/0]}, election epoch 21, leader 0 node1, quorum 0 node1

Check the election status of MON

ceph quorum_status
{
    
    "election_epoch":21,"quorum":[0],"quorum_names":["node1"],"quorum_leader_name":"node1","quorum_age":2619,"monmap":{
    
    "epoch":1,"fsid":"46634c97-5c3c-424b-b2d9-653a15849c61","modified":"2021-01-29 17:59:58.311910","created":"2021-01-29 17:59:58.311910","min_mon_release":14,"min_mon_release_name":"nautilus","features":{
    
    "persistent":["kraken","luminous","mimic","osdmap-prune","nautilus"],"optional":[]},"mons":[{
    
    "rank":0,"name":"node1","public_addrs":{
    
    "addrvec":[{
    
    "type":"v2","addr":"10.0.0.131:3300","nonce":0},{
    
    "type":"v1","addr":"10.0.0.131:6789","nonce":0}]},"addr":"10.0.0.131:6789/0","public_addr":"10.0.0.131:6789/0"}]}}

View the mapping information of MON

ceph mon dump

dumped monmap epoch 1
epoch 1
fsid 46634c97-5c3c-424b-b2d9-653a15849c61
last_changed 2021-01-29 17:59:58.311910
created 2021-01-29 17:59:58.311910
min_mon_release 14 (nautilus)
0: [v2:10.0.0.131:3300/0,v1:10.0.0.131:6789/0] mon.node1

Delete a MON node

ceph mon remove ceph1
# 如果是部署节点,也可以使用 ceph-deploy 删除 
ceph-deploy mon remove ceph1

Get a running mon map and save it in the specified file

ceph mon getmap -o mon.txt
got monmap epoch 1

View the map obtained above

monmaptool --print mon.txt
ool: monmap file mon.txt
epoch 1
fsid 0862c251-2970-4329-b171-53a77d52b2d4
last_changed 2020-05-07 02:16:50.749480
created 2020-05-07 02:16:50.749480
0: 172.31.5.182:6789/0 mon.ceph1

Inject the above mon map into the newly added node

ceph-mon -i ceph1 --inject-monmap mon.txt

View MON'samin socket

ceph-conf --name mon.ceph1 --show-config-value admin_socket

/var/run/ceph/ceph-mon.ceph1.asok

Check the detailed status of MON

ceph daemon mon.ceph1 mon_status

OSD operation
View ceph osd running status

ceph osd stat
3 osds: 3 up (since 45m), 3 in (since 3w); epoch: e96

View osd mapping information

ceph osd dump

View the directory tree of osd

ceph osd tree

delete OSD

# 1. down 掉 OSD
ceph osd down osd.0
# 2. 踢出集群
ceph osd out osd.0
# 3. 移除 OSD 
ceph osd rm osd.0
# 4. 删除授权
ceph auth rm osd.0
# 5. 删除 crush map 
ceph osd crush rm osd.0

Set the maximum number of OSDs

# 获取 OSD 最大个数
ceph osd getmaxosd
# 设置 OSD 最大个数
ceph osd setmaxosd 10 

Set the weight of OSD crush

ceph osd crush set 3 3.0 host=ceph2

set item id 3 name 'osd.3' weight 3 at location {
    
    host=ceph2} to crush map

View the configuration parameters of an osd in the cluster

ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show | less
# 另一种方式
ceph -n osd.0 --show-config |grep objectstore

Dynamically set the parameter configuration of osd in the cluster

# 设置单个 osd
ceph tell osd.0 injectargs "--osd_recovery_op_priority 63"
# 设置所有的 osd
ceph tell osd.* injectargs "--osd_recovery_op_priority 63"

MDS Operation
View MDS Status

ceph mds stat

View the mapping information of MDS

ceph mds dump

Delete MDS node

# 删除第一个 MDS 节点
ceph mds rm 0

mds gid 0 dne

Storage pool operation
View the number of pools in the ceph cluster

ceph osd lspools

Create storage pool

ceph osd pool create testpool 128 128 # 128 指 PG 数量

Configure quotas for a ceph pool

ceph osd pool set-quota testpool max_objects 10000

Delete a pool in the cluster

#首先要在 ceph.conf 文件中配置允许删除集群
mon_allow_pool_delete = true
#然后重启 MON 进程
sudo service ceph-mon@ceph1 restart
#在删除存储池
ceph osd pool delete testpool testpool  --yes-i-really-really-mean-it

PG operation
View PG status

ceph pg stat

View the mapping information of the pg group

ceph pg dump

View the status of stuck in pg

ceph pg dump_stuck unclean
ceph pg dump_stuck inactive
ceph pg dump_stuck stale

get pg_num/pgp_num

ceph osd pool get mytestpool pg_num
ceph osd pool get mytestpool pgp_num

set pg_num

ceph osd pool set mytestpool pg_num 512
ceph osd pool set mytestpool pgp_num 512

restore a lost pg

ceph pg {
    
    pg-id} mark_unfound_lost revert

repair pg data

ceph pg crush repair {
    
    pg_id}
ceph pg repair {
    
    pg_id}

pg showing abnormal status

ceph pg dump_stuck inactive|unclean|stale

Guess you like

Origin blog.csdn.net/HYXRX/article/details/113931099