Liveness Probe

Survival Probe

 
Kubernetes provides self-healing capabilities, specifically, it can sense that a container has crashed, and then be able to restart the container. But sometimes, for example, the memory of the Java program leaks and the program cannot work normally, but the JVM process is always running. For this kind of application's own business problem, Kubernetes provides a Liveness Probe mechanism to detect whether the container response is normal. Decide whether to restart, this is a good health check mechanism.

There is no doubt that it is best to define a Liveness Probe for each Pod, otherwise Kubernetes cannot perceive whether the Pod is operating normally.

Kubernetes supports the following three detection mechanisms.

  • HTTP GET: Send an HTTP GET request to the container. If Probe receives 2xx or 3xx, the container is healthy.
  • TCP Socket: Try to establish a TCP connection with the specified port of the container. If the connection is successfully established, the container is healthy.
  • Exec: Probe executes the command in the container and checks the status code of the command exit. If the status code is 0, the container is healthy.
    There is also a Readiness Probe corresponding to the Survival Probe, which will be described in detail in the Readiness Probe.

HTTP GET

The HTTP GET method is the most common detection method. The specific mechanism is to send an HTTP GET request to the container. If the Probe receives 2xx or 3xx, the container is healthy. The definition method is as follows.

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    livenessProbe:           # liveness probe
      httpGet:               # HTTP GET定义
        path: /
        port: 80
  imagePullSecrets: 
  - name: default-secret

Create this Pod.

$ kubectl create -f liveness-http.yaml
pod/liveness-http created

As above, this Probe sends an HTTP GET request to port 80 of the container. If the request is unsuccessful, Kubernetes will restart the container.

View Pod details.

$ kubectl describe po liveness-http
Name:               liveness-http
......
Containers:
  liveness:
    ......
    State:          Running
      Started:      Mon, 03 Aug 2020 03:08:55 +0000
    Ready:          True
    Restart Count:  0
    Liveness:       http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-vssmw (ro)
......

You can see that the current status of the Pod is Running, and the Restart Count is 0, indicating that there is no restart. If Restart Count is not 0, it means it has restarted.

TCP Socket

TCP Socket tries to establish a TCP connection with the specified port of the container. If the connection is successfully established, the container is healthy. The definition method is as follows.

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-tcp
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    livenessProbe:           # liveness probe
      tcpSocket:
        port: 80
  imagePullSecrets: 
  - name: default-secret

Exec

Exec executes specific commands. The specific mechanism is that Probe executes the commands in the container and checks the status code of the command exit. If the status code is 0, it means health. The definition method is as follows.

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: nginx:alpine
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:           # liveness probe
      exec:                  # Exec定义
        command:
        - cat
        - /tmp/healthy
  imagePullSecrets: 
  - name: default-secret

The above definition executes the cat /tmp/healthy command in the container. If it is successfully executed and returns 0, the container is healthy. In the above definition, the command will delete /tmp/healthy after 30 seconds, which will cause Liveness Probe to determine that the Pod is in an unhealthy state, and then restart the container.

Liveness Probe advanced configuration

There is the following line in the echo of the describe command of liveness-http above.

Liveness: http-get http://:8080/ delay=0s timeout=1s period=10s #success=1 #failure=3

This line represents the specific parameter configuration of Liveness Probe, and its meaning is as follows:

  • delay: delay, delay=0s, it means that the detection starts immediately after the container is started, there is no delay time
  • timeout: timeout, timeout=1s, indicating that the container must respond within 1s, otherwise the detection will be recorded as a failure
  • period: period, period=10s, which means the container is detected every 10s
  • success: success, #success=1, means success after 1 consecutive success
  • failure: Failure, #failure=3, which means that the container will be restarted after 3 consecutive failures. The
    above survival probe means: detection will be performed immediately after the container is started. If the 1s inner container does not respond, it will be recorded as a detection failure. Perform a detection every 10s, and restart the container after the detection fails 3 consecutive times.

These are set by default when they are created, or you can configure them manually, as shown below.

apiVersion: v1
kind: Pod
metadata:
  name: liveness-http
spec:
  containers:
  - image: k8s.gcr.io/liveness
    livenessProbe:
      httpGet:
        path: /
        port: 8080
      initialDelaySeconds: 10    # 容器启动后多久开始探测
      timeoutSeconds: 2          # 表示容器必须在2s内做出相应反馈给probe,否则视为探测失败
      periodSeconds: 30          # 探测周期,每30s探测一次
      successThreshold: 1        # 连续探测1次成功表示成功
      failureThreshold: 3        # 连续探测3次失败表示失败

The initialDelaySeconds is generally set to be greater than 0. This is because in many cases, although the container starts successfully, it takes a certain amount of time for the application to be ready. It needs to wait for the ready time to return to success, otherwise the probe will often fail.

In addition, failureThreshold can set multiple loop detections, so that in actual applications, the health check program does not need multiple loops. This needs to be paid attention to when developing applications.

Configure a valid Liveness Probe

  • What should Liveness Probe check?
    A good Liveness Probe should check whether all key parts of the application are healthy, and use a proprietary URL to access, such as /health, execute this function when /health is accessed, and then return the corresponding result. It is important to note that authentication cannot be done, otherwise the probe will continue to fail and fall into an endless loop of restart.

In addition, the check can only be limited to the inside of the application, and cannot check the parts that depend on the outside. For example, when the current web server cannot connect to the database, this cannot be regarded as unhealthy.

  • Liveness Probe must be lightweight.
    Liveness Probe can't take up too many resources and can't take up too long, otherwise all resources are doing health checks, which is meaningless. For example, for Java applications, it is best to use the HTTP GET method. If you use the Exec method, the JVM startup takes up a lot of resources.

Guess you like

Origin blog.51cto.com/14051317/2553687