CEPH集群重启后ceph osd status和ceph -s命令查看发现节点服务器osd服务不成功的解决方案

前言

  • 实验环境部署ceph集群(结合openstack多节点),重启之后ceph集群的osd服务出现问题,解决如下
  • ceph+openstack多节点部署,有兴趣的可以查看我另一篇博客:https://blog.csdn.net/CN_TangZheng/article/details/104745364

一:报错

  • [root@ct ~]# ceph -s	'//查看ceph集群状态'
      cluster:
        id:     8c9d2d27-492b-48a4-beb6-7de453cf45d6
        health: HEALTH_WARN	'//健康检查为warn'
                1 osds down	
                1 host (1 osds) down	
                Reduced data availability: 192 pgs inactive
                Degraded data redundancy: 812/1218 objects degraded (66.667%), 116 pgs degraded, 192 pgs undersized
                clock skew detected on mon.c1, mon.c2
     
      services:
        mon: 3 daemons, quorum ct,c1,c2
        mgr: ct(active), standbys: c1, c2
        osd: 3 osds: 1 up, 2 in	'//二个OSD服务宕了'
     
      data:
        pools:   3 pools, 192 pgs
        objects: 406  objects, 1.8 GiB
        usage:   2.8 GiB used, 1021 GiB / 1024 GiB avail
        pgs:     100.000% pgs not active
                 812/1218 objects degraded (66.667%)
                 116 undersized+degraded+peered
                 76  undersized+peered
     
    [root@ct ~]# ceph osd status	'//查看osd服务状态,发现两个计算节点的osd服务状态不正常'
    +----+------+-------+-------+--------+---------+--------+---------+----------------+
    | id | host |  used | avail | wr ops | wr data | rd ops | rd data |     state      |
    +----+------+-------+-------+--------+---------+--------+---------+----------------+
    | 0  |  ct  | 2837M | 1021G |    0   |     0   |    0   |     0   |   exists,up    |
    | 1  |      |    0  |    0  |    0   |     0   |    0   |     0   |     exists     |
    | 2  |      |    0  |    0  |    0   |     0   |    0   |     0   | autoout,exists |
    +----+------+-------+-------+--------+---------+--------+---------+----------------+
    
    

1.1:解决

  • 发现neutron的Open vSwitch服务挂了

    [root@ct ~]# source keystonerc_admin 
    [root@ct ~(keystone_admin)]# openstack network agent list	'//经过排查,发现Open vSwitch和L3服务挂掉了'
    +--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
    | ID                                   | Agent Type           | Host | Availability Zone | Alive | State | Binary                    |
    +--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
    | 12dd5b51-1344-4c29-8974-e5d8e0e65d2e | Open vSwitch agent   | c1   | None              | XXX   | UP    | neutron-openvswitch-agent |
    | 20829a10-4a26-4317-8175-4534ac0b01e1 | Open vSwitch agent   | c2   | None              | XXX   | UP    | neutron-openvswitch-agent |
    | 25c121ec-b761-4e7b-bfbf-9601993ebb54 | Metadata agent       | ct   | None              | :-)   | UP    | neutron-metadata-agent    |
    | 47c878ee-93f0-4960-baa1-1cc92476ed2a | DHCP agent           | ct   | nova              | :-)   | UP    | neutron-dhcp-agent        |
    | 57647383-7106-46b6-971f-2398457e5179 | Loadbalancerv2 agent | ct   | None              | :-)   | UP    | neutron-lbaasv2-agent     |
    | 92d49052-0b4f-467c-a92c-1743d891043f | Metering agent       | ct   | None              | :-)   | UP    | neutron-metering-agent    |
    | c2f7791c-96ed-472b-abda-509a3ff125b5 | L3 agent             | ct   | nova              | XXX   | UP    | neutron-l3-agent          |
    | e48269d8-e4f1-424b-bc3e-4c0d13757e8a | Open vSwitch agent   | ct   | None              | :-)   | UP    | neutron-openvswitch-agent |
    +--------------------------------------+----------------------+------+-------------------+-------+-------+---------------------------+
    
    
  • 控制节点重启l3服务

    [root@ct ~(keystone_admin)]# systemctl start neutron-l3-agent
    
    
  • 计算节点重启Open vSwitch agent

    [root@c1 ceph]# systemctl restart neutron-openvswitch-agent
    [root@c2 ceph]# systemctl restart neutron-openvswitch-agent
    
    
  • 重启完成后再次查看openstack network agent list服务是否都正常开启

  • 我们进入计算节点重启osd,

    [root@c1 ceph]# systemctl restart  ceph-osd.target
    [root@c2 ceph]# systemctl restart  ceph-osd.target
    [root@c1 ceph]# systemctl restart  ceph-mgr.target
    [root@c2 ceph]# systemctl restart  ceph-mgr.target
    '//重启OSD服务后使用ceph -s命令查看ceph集群状态,若计算节点的mgr服务没有开启也需要重启一下'
    

1.2:再次检查,发现问题已经解决

  • [root@ct ~(keystone_admin)]# ceph -s
      cluster:
        id:     8c9d2d27-492b-48a4-beb6-7de453cf45d6
        health: HEALTH_OK	'//健康检查OK'
     
      services:	'//下面的服务也都正常了'
        mon: 3 daemons, quorum ct,c1,c2
        mgr: ct(active), standbys: c2, c1
        osd: 3 osds: 3 up, 3 in
     
      data:
        pools:   3 pools, 192 pgs
        objects: 406  objects, 1.8 GiB
        usage:   8.3 GiB used, 3.0 TiB / 3.0 TiB avail
        pgs:     192 active+clean
     
      io:
        client:   1.5 KiB/s rd, 1 op/s rd, 0 op/s wr
    [root@ct ~(keystone_admin)]# ceph osd status	'//OSD状态也都没问题'
    +----+------+-------+-------+--------+---------+--------+---------+-----------+
    | id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
    +----+------+-------+-------+--------+---------+--------+---------+-----------+
    | 0  |  ct  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
    | 1  |  c1  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
    | 2  |  c2  | 2837M | 1021G |    0   |     0   |    0   |     0   | exists,up |
    +----+------+-------+-------+--------+---------+--------+---------+-----------+
    
    
原创文章 172 获赞 97 访问量 5万+

猜你喜欢

转载自blog.csdn.net/CN_TangZheng/article/details/104755431
今日推荐