Kolla-ansible repairs the openstack cluster records of abnormal downtime

Due to an abnormal power outage in the office computer room, the openstack experimental environment cluster can no longer be used normally. Try to use the kolla-ansible tool to restart the cluster.

1. Environment

[root@kolla-ansible-master ~]# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)

[root@kolla-ansible-master ~]# ansible --version
ansible 2.7.18

[root@kolla-ansible-master ~]# pip list | grep kolla-ansible
kolla-ansible                    7.2.2.dev9

[root@kolla-ansible-master ~]# openstack --version
openstack 5.2.1

2. Records

1. Status

After an incoming call to restart the machine cluster, some openstack containers restart abnormally, and the cluster cannot work normally

 kolla-ansible-master:4000/kolla/centos-source-heat-engine:rocky               "dumb-init --single-…"   15 months ago       Up About a minute                                           heat_engine
c07e5d01adce        kolla-ansible-master:4000/kolla/centos-source-heat-api-cfn:rocky              "dumb-init --single-…"   15 months ago       Restarting (1) 1 second ago                                 heat_api_cfn
88b7a106dcd8        kolla-ansible-master:4000/kolla/centos-source-heat-api:rocky                  "dumb-init --single-…"   15 months ago       Restarting (1) Less than a second ago                       heat_api
82b5983614e0        kolla-ansible-master:4000/kolla/centos-source-neutron-server:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           neutron_server
feaf96f16403        kolla-ansible-master:4000/kolla/centos-source-nova-compute-ironic:rocky       "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_compute_ironic
cb9184ff5506        kolla-ansible-master:4000/kolla/centos-source-nova-novncproxy:rocky           "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_novncproxy
17bf7758070d        kolla-ansible-master:4000/kolla/centos-source-nova-consoleauth:rocky          "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_consoleauth
619d66b56612        kolla-ansible-master:4000/kolla/centos-source-nova-conductor:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_conductor
249b423c2728        kolla-ansible-master:4000/kolla/centos-source-nova-scheduler:rocky            "dumb-init --single-…"   15 months ago       Up About a minute                                           nova_scheduler
beace5f229e2        kolla-ansible-master:4000/kolla/centos-source-nova-api:rocky                  "dumb-init --single-…"   15 months ago       Restarting (1) 5 seconds ago                                nova_api

2. Check the problem

Check the logs and containers and find that nova-api is abnormal and keeps reloading

“Restarting (1) 5 seconds ago nova_api”，

And the services under it are running normally.

3. Try to fix

3.1 Stop the virtual machine Server

[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| ID                                   | Name  | Status | Networks              | Image  | Flavor  |
+--------------------------------------+-------+--------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | ACTIVE | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+--------+-----------------------+--------+---------+

[root@kolla-ansible-master ~]# openstack server stop demo1

3.2 Stop Nova service

[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05  stop --tags nova
Stop Kolla containers : ansible-playbook -i ./multinode05 -e @/etc/kolla/globals.yml -e @/etc/kolla/passwords.yml -e CONFIG_DIR=/etc/kolla  --tags nova /usr/share/kolla-ansible/ansible/stop.yml 

PLAY [all] ******************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************************************************************************************
ok: [localhost]
ok: [compute01]
ok: [compute03]
ok: [compute02]
ok: [network01]
ok: [controller01]

PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01                  : ok=1    changed=0    unreachable=0    failed=0   
compute02                  : ok=1    changed=0    unreachable=0    failed=0   
compute03                  : ok=1    changed=0    unreachable=0    failed=0   
controller01               : ok=1    changed=0    unreachable=0    failed=0   
localhost                  : ok=1    changed=0    unreachable=0    failed=0   
network01                  : ok=1    changed=0    unreachable=0    failed=0

3.3 Restart Nova

[root@kolla-ansible-master ~]# kolla-ansible -i ./multinode05  deploy --tags nova


PLAY RECAP ******************************************************************************************************************************************************************************************************
compute01                  : ok=42   changed=0    unreachable=0    failed=0   
compute02                  : ok=42   changed=0    unreachable=0    failed=0   
compute03                  : ok=42   changed=0    unreachable=0    failed=0   
controller01               : ok=56   changed=2    unreachable=0    failed=0   
localhost                  : ok=2    changed=0    unreachable=0    failed=0   
network01                  : ok=2    changed=0    unreachable=0    failed=0

4.4 Restart the virtual machine

[root@kolla-ansible-master ~]# openstack server list
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| ID                                   | Name  | Status  | Networks              | Image  | Flavor  |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
| b4634124-a315-4fd8-aa4a-3df8cade2335 | demo1 | SHUTOFF | demo-net=192.168.19.8 | cirros | m1.tiny |
+--------------------------------------+-------+---------+-----------------------+--------+---------+
[root@kolla-ansible-master ~]# openstack server start demo1
[root@kolla-ansible-master ~]#

Written at the end: In the production environment, the probability of abnormal power failure is extremely small, and the daily routine is to replace a certain device or host. In the experimental environment, it can also be redeployed. Here is only a way to repair the cluster.

Kolla-ansible repairs the openstack cluster records of abnormal downtime

Guess you like