k8s health check-Health Check

1. The default method of k8s health check

The default health check mechanism of k8s: Based on the CMD or ENTRYPOINT in the Dockerfile, if the return code is non-zero when the process exits, the container is considered to be faulty, and k8s will restart the container according to the restartPolicy.

1) There are 4 types of restartPolicy of docker:

always: (No matter what code exits, docker daemon will try to restart the exited container, after manual stop, the policy will no longer take effect);
OnFailure: The container stopped due to an error (non-zero exit). Max-retries can specify the maximum number of restart attempts;
unless-stopped: Similar to always, the difference is that after manually stopping the container, even if the docker deemon is restarted, the container policy will no longer take effect.
no: Do not restart automatically (default mode)

2) Format in k8s

...
spec：
  restartPolicy: OnFailure   #重启策略
  containers:
...

3) Defects

The disadvantage of this mechanism is that sometimes the container fails, but the process does not exit. For example, the internal web service of the container displays 500, or the system is overloaded, but the httpd process does not exit abnormally at this time, so the container is still normal run. In short, it is impossible to determine that the service release in the container is normal.

2. Health Check-> liveness detection

1) The purpose of liveness

Users can customize the condition of judging whether the container is healthy. If the detection fails, k8s will restart the container, thereby telling k8s when to restart the container to achieve self-healing.

2) The keywords of livenessProbe

...
spec：
  containers：
  ...
  livenessProbe：           #Health Check的机制
httpGet：                   #探测方式：http的方式
path：/example/index.html   #默认的索引目录
port: 8080                 #服务的端口
scheme: http               #用到的协议
    initialDelaySeconds: 5 #容器启动10秒后开始执行liveness探测；若某个容器启动需要30秒，则这个值就要设置大于30秒
    periodSeconds: 10      #每次执行liveness探测的时间间隔
    failureThreshold: 3    #liveness探测失败的次数；如果连续三次失败，就会杀掉进程重启容器
    successThreshold: 1    #liveness探测成功的次数；如果成功1次，就表示容器正常
timeoutSeconds: 5          #执行livesness的超时时间，如果执行后5秒没有结果，则重启执行liveness

3. Health Check-> readness detection

1) The purpose of readness

Readness detection is to tell when the container can be added to the svc load balancing pool,
and the configuration syntax of Readness detection is exactly the same as that of liveness.

4. Summary

If liveness and readness are not specifically configured, k8s will use the default method. That is, by judging whether the return value of the container startup process is zero to determine whether the detection is successful.
Livess and readness configuration are exactly the same, the syntax and parameters are the same, the difference is the behavior after the detection fails: Livess detection is to restart the container; while readness detection is to set the container as unavailable and not accept requests forwarded by the service
Liveness and readness are executed independently, there is no dependency between the two, they can be used alone or at the same time:
Liveness探测判断的是容器是否需要重启实现自愈;
Readness探测判断的是容器是否已准备好对外提供服务.

5. Liveness and readness detection methods

http Get：返回200-400算成功，别的算失败；
Insert picture description here
tcp socket：你指定的tcp端口打开，比如能telnet 上；
cmd exec：在容器中执行一个命令推出返回0 算成功。