Ansible best practice Playbook management rolling update

written in front


  • If you don’t understand enough, please help me to correct

In the evening, you sit under the eaves, watching the sky slowly getting dark, feeling lonely and desolate in your heart, feeling that your life has been deprived of you. I was a young man at the time, but I was afraid to live like this, to grow old. In my opinion, this is something more terrible than death. --------Wang Xiaobo

Ansible rolling updates

About what is rolling update

Normally, when Ansible runs play, it ensures that all managed hosts have completed each task before starting any host for the next one. Runs the notified handlesprogram .

Running all tasks on all hosts may result in unexpected behavior. For example, when updating a web load balancer, if all web servers are updated at the same time, it may cause all web servers to stop working.

Ansible supports rolling updates - one will update a large number of hosts in batches, such benefits:

  • At the same time, only some servers are being updated, while other servers can still provide external services.
  • If the update of this batch of servers fails, other servers can still provide external services.

So it is generally recommended to configure in the updated playbook:

  • Monitor the update process and test the update results.
  • If the update fails, isolate the affected hosts to analyze the failed deployment, or roll back the host configuration in the affected batch.
  • Send deployment results to relevant personnel.

Control batch size

By default, Ansible will require all hosts in Play to complete a previous task before starting the next one. 如果某一任务失败,则所有主机将只有一部分通过该任务。意味着任何主机都无法正常工作,可能会导致中断. Ideally, the next batch of hosts needs to pass Play successfully before starting the next batch, and if too many hosts fail, the entire Play can be aborted.

Set a fixed batch size

Use serialthe keyword to specify how many hosts should be in each batch.

Ansible will run through each batch of hosts through Play before starting the next batch, if all hosts in the current batch fail, the entire Play will abort and Ansible will not start the next batch.

[student@workstation task-execution]$ cat serial.yaml
---
- name: 滚动更新
  hosts: all
  serial: 2
  tasks:
    - name: update web
      shell: sleep 2
[student@workstation task-execution]$ ansible-playbook  serial.yaml

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servera]
ok: [serverb]

TASK [update web] ********************************************************************************************
changed: [servera]
changed: [serverb]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverd]
ok: [serverc]

TASK [update web] ********************************************************************************************
changed: [serverc]
changed: [serverd]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servere]
ok: [serverf]

TASK [update web] ********************************************************************************************
changed: [servere]
changed: [serverf]

PLAY RECAP ***************************************************************************************************
servera                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverb                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverc                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverd                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
servere                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverf                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

In the example above, the serial keyword specifies that Ansible process the hosts in the web_servers hostgroup in batches of two hosts. If the play executes normally without errors, the play is repeated again with a new batch.

If the total number of hosts in play is not divisible by the batch size, the last batch may contain fewer hosts than specified by the serial keyword. Integers are used in the serial keyword.

Set batch size as a percentage

It can also be set as a percentage for the serial keyword:

[student@workstation task-execution]$ cat serial.yaml
---
- name: 滚动更新
  hosts: all
  serial: 25%
  tasks:
    - name: update web
      shell: sleep 2
[student@workstation task-execution]$ ansible-playbook  serial.yaml

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servera]

TASK [update web] ********************************************************************************************
changed: [servera]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverb]

TASK [update web] ********************************************************************************************
changed: [serverb]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverc]

TASK [update web] ********************************************************************************************
changed: [serverc]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverd]

TASK [update web] ********************************************************************************************
changed: [serverd]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servere]

TASK [update web] ********************************************************************************************
changed: [servere]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverf]

TASK [update web] ********************************************************************************************
changed: [serverf]

PLAY RECAP ***************************************************************************************************
servera                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverb                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverc                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverd                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
servere                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverf                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

[student@workstation task-execution]$

Each batch is set separately

[student@workstation task-execution]$ cat serial.yaml
---
- name: 滚动更新
  hosts: all
  serial:
    - 25%
    - 3
    - 100%
  tasks:
    - name: update web
      shell: sleep 2
[student@workstation task-execution]$
[student@workstation task-execution]$ vim  serial.yaml
[student@workstation task-execution]$ ansible-playbook  serial.yaml

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servera]

TASK [update web] ********************************************************************************************
changed: [servera]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [serverc]
ok: [serverb]
ok: [serverd]

TASK [update web] ********************************************************************************************
changed: [serverb]
changed: [serverc]
changed: [serverd]

PLAY [滚动更新] **************************************************************************************************

TASK [Gathering Facts] ***************************************************************************************
ok: [servere]
ok: [serverf]

TASK [update web] ********************************************************************************************
changed: [servere]
changed: [serverf]

PLAY RECAP ***************************************************************************************************
servera                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverb                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverc                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverd                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
servere                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
serverf                    : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

CancelPlay

By default, Ansible tries to acquire as many hosts as possible to complete the play. If a task fails for a host, it will be dropped from the play, but Ansible will continue to run the remaining tasks in the play for other hosts. Play will only stop if all hosts fail.

However, if hosts are organized into batches using the serial keyword, then if all hosts in the current batch fail, Ansible will stop play on all remaining hosts, not just those remaining in the current batch. If execution of this play is stopped due to failure of all hosts in a batch, the next batch will not start.

Each batch in Ansible's ansible_play_batch variable keeps a list of active servers. Any hosts with failed tasks will be removed from the ansible play batch list. Ansible updates this list after each task.

Specify fault tolerance max_fail_percentage

Here you can 指定容错use to terminate the script early. Change Ansible's failure behavior by adding max_fail_percentagethe keyword to the playbook

- name: 滚动更新
  hosts: all
  max_fail_percentage: 30%
  serial:
    - 25%
    - 3
    - 100%
  tasks:
    - name: update web
      shell: sleep 2

The above configuration, that is, if the machines 30%in fail to execute tasks, that will terminate the script early.

blog reference

《Red Hat Ansible Engine 2.8 DO447》

Guess you like

Origin blog.csdn.net/sanhewuyang/article/details/130369690