Using Health Check in Rolling Update - 5 Minutes a Day with Docker Containers (146)

The previous section discussed the application of Health Check in Scale Up. Another important application scenario of Health Check is Rolling Update. Consider the following situation:

There is an existing multi-copy application that is running normally, and then the application is updated (such as using a higher version of the image), Kubernetes will start the new copy, and then the following events occur:

  1. Under normal circumstances, the new replica takes 10 seconds to complete the preparation, and cannot respond to business requests until then.

  2. However, due to human configuration errors, the replica is always unable to complete the preparation work (for example, unable to connect to the backend database).

Don't look further, now take a minute to think about this question: What happens if Health Check is not configured?

Because the new copy itself does not exit abnormally, the default Health Check mechanism will consider that the container is ready, and then gradually replace the existing copy with the new copy. The result is: when all the old copies are replaced, the entire application will not be able to process requests. Unable to provide external services. If this were to happen on a critical production system, the consequences would be dire.

If Health Check is configured correctly, the new copy will be added to the Service only if it passes the Readiness detection; if it fails the detection, the existing copies will not be completely replaced, and the business will continue to operate normally.

The following is an example to practice the application of Health Check in Rolling Update.

app.v1.yml Simulate a 10-copy application with the following configuration file  :

After 10 seconds the replica can pass the readiness detection.

Next, rolling update the application, the configuration file  app.v2.yml is as follows:

Obviously, since the new replica does not exist  /tmp/healthy, it cannot be detected by readiness. Verify as follows:

This screenshot contains a lot of information and deserves a detailed analysis.

Focus on the  kubectl get pod output first:

  1. Judging from the Pod  AGE column, the last 5 Pods are new copies and are currently in the NOT READY state.

  2. The old copies were reduced from the original 10 to 8.

Look  kubectl get deployment app at the output again:

  1. DESIRED 10 means that the desired state is 10 copies of READY.

  2. CURRENT 13 represents the total number of current replicas: 8 old replicas + 5 new replicas.

  3. UP-TO-DATE 5 represents the number of replicas that have been updated so far: 5 new replicas.

  4. AVAILABLE 8 represents the number of replicas currently in READY state: 8 old replicas.

In our setup, new replicas never pass the readiness detection, so this state remains forever.

Above we simulated a scenario where a rolling update fails. Luckily though: Health Check helped us block the flawed copies while keeping most of the old copies, and the business wasn't affected by the failed update.

Next we have to answer: Why is the number of newly created replicas 5, and only 2 old replicas are destroyed at the same time?

The reason is: rolling update   controls the number of replica replacements through the parameter maxSurge sum  .maxUnavailable

maxSurge

This parameter controls the upper limit for the total number of replicas to be exceeded during a rolling update  DESIRED . maxSurge It can be a specific integer (such as 3), or it can be a hundred percent, rounded up. maxSurge The default value is 25%.

In the above example, it DESIRED is 10, then the maximum number of replicas is:
roundUp(10 + 10 * 25%) = 13

So we see  CURRENT 13.

maxUnavailable

DESIRED This parameter controls the maximum percentage of  unavailable replicas during a rolling update  . maxUnavailable It can be a specific integer (such as 3) or a hundred percent, rounded down. maxUnavailable The default value is 25%.

In the above example, DESIRED 10, then the number of replicas available must be at least:
10 - roundDown(10 * 25%) = 8

So we see  AVAILABLE that is 8.

maxSurge The higher the value, the more new copies are initially created; the higher the maxUnavailable value, the more old copies are initially destroyed.

Ideally, the rolling update process for our case should look like this:

  1. First create 3 new replicas to bring the total number of replicas to 13.

  2. Then destroy the 2 old copies to bring the number of available copies down to 8.

  3. When the 2 old copies are successfully destroyed, 2 more new copies can be created, keeping the total number of copies at 13.

  4. When the new replica passes the readiness detection, the number of available replicas will increase to more than 8.

  5. In turn, more old copies can be destroyed, bringing the number of available copies back to 8.

  6. Destruction of old copies brings the total number of copies below 13, which allows more new copies to be created.

  7. This process will continue, and eventually all old copies will be replaced by new copies, and the rolling update is complete.

And our actual situation is that it is stuck at step 4, and the new replica cannot pass the readiness detection. This process can be viewed in  kubectl describe deployment app the log section of .

If the rolling update fails, you can  kubectl rollout undo roll back to the previous version.

If you want to customize  the maxSurge sum  maxUnavailable, you can configure it as follows:

summary

In this chapter, we discussed the two mechanisms of Kubernetes health check: Liveness detection and Readiness detection, and practiced the application of health check in Scale Up and Rolling Update scenarios.

In the next section we start discussing how Kubernetes manages data. 

books:

1. "Play Kubernetes for 5 minutes a day"
https://item.jd.com/26225745440.html

2. "Fun with Docker container technology for 5 minutes a day"
https://item.jd.com/16936307278.html

3. "Fun with OpenStack for 5 minutes a day"
https://item.jd.com/12086376.html

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324458501&siteId=291194637