Survival probe (Liveness), readiness probe (Readiness), startup probe (Startup)

Survival probe (Liveness), readiness probe (Readiness), startup probe (Startup)

The biggest difference between readiness probe and liveness probe

The survival probe is to kill the container that fails the check, and create a new startup container to keep the pod working normally; the
readiness probe is to move the pod out of the service instead of restarting the container when the readiness probe fails the check. Make sure that all pods in the service are available, make sure that clients only interact with healthy pods and that clients never know there is a problem with the system.

1. Liveness Probe

survival probe

It is used to judge whether the container is alive (running state). If the LivenessProbe probe detects that the container is unhealthy, the kubelet will kill the container and perform corresponding processing according to the restart strategy of the container. If a container does not contain a LivenessProbe probe, kubelet thinks that the value returned by the LivenessProbe probe of the container will always be "Success".

detection mechanism

Kubernetes supports the following three detection mechanisms.

  • HTTP GET: Send an HTTP GET request to the container. If the Probe receives 2xx or 3xx, the container is healthy.

  • TCP Socket: Attempt to establish a TCP connection with the specified port of the container. If the connection is successfully established, the container is healthy.

  • Exec: Probe executes the command in the container and checks the status code of the command exit. If the status code is 0, the container is healthy.

HTTP GET (it is recommended to use httpGet probe for production environment)

The HTTP GET method is the most common detection method. Its specific mechanism is to send an HTTP GET request to the container. If the Probe receives 2xx or 3xx, it means that the container is healthy. The definition method is as follows.

apiVersion: apps/v1
kind: ReplicaSet
metadata: 
  name: rs-nginx-http
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-rs
    matchExpressions:
      - {
    
    key: env, operator: In, values: [dev]}
  template:
     metadata:
       labels:
         app: nginx-rs
         env: dev
     spec:
         containers:
         - image: nginx:1.7.9
           name: nginx-container
           ports:
           - containerPort: 80
           livenessProbe:						#为容器定义一个存活探针
             httpGet:							#探针类型为httpGet探针
       #       host: 10.244.0.49			    #指定主机,默认是pod的ip,一般省略不写
               path: /							#路径
               port: 80							#端口
             initialDelaySeconds: 15			#初始延时时间为15s
             periodSeconds: 5					#探测周期
             timeoutSeconds: 2					#超时时间
             failureThreshold: 9				#失败次数

Description of Survival Probe Properties

initialDelaySeconds:表示在容器启动后延时多久秒才开始探测;
periodSeconds:表示执行探测的频率,即间隔多少秒探测一次,默认间隔周期是10秒,最小1秒;
timeoutSeconds:表示探测超时时间,默认1秒,最小1秒,表示容器必须在超时时间范围内做出响应,否则视为本次探测失败;
successThreshold:表示最少连续探测成功多少次才被认定为成功,默认是1,对于liveness必须是1,最小值是1;
failureThreshold:表示连续探测失败多少次才被认定为失败,默认是3,连续3次失败,k8s 将根据pod重启策略对容器做出决定;

Note: When defining the survival probe, be sure to set the initialDelaySeconds property, which is the initial delay. If not set, the probe will start to detect when the default container starts. This may cause the application to not start ready, which will cause If the probe detection fails, k8s will kill the container according to the pod restart strategy and then recreate the inexplicable problem of the container.
In a production environment, be sure to define a liveness probe.

exec probe

The exec probe is to execute the shell command in the container, and judge the success or failure according to whether the command exit status code is 0;

[root@master ~]# cat rs_nginx.yaml 
apiVersion: apps/v1
kind: ReplicaSet
metadata: 
  name: rs-nginx-http
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-rs
    matchExpressions:
      - {
    
    key: env, operator: In, values: [dev]}
  template:
     metadata:
       labels:
         app: nginx-rs
         env: dev
     spec:
         containers:
         - image: nginx:1.7.9
           name: nginx-container
           ports:
           - containerPort: 80
           livenessProbe:									#定义探针
             exec:											#探针类型为exec探针
               command:
               - cat										#表示cat /usr/share/nginx/html/index.html文件,根据命令执行
               - /usr/share/nginx/html/index.html			#结果返回状态是否非0来表示探针状态
             initialDelaySeconds: 8							#初始延时,表示容器启动15秒后才开始探测
             periodSeconds: 3								#探测周期,3秒探测一次
             timeoutSeconds: 2								#超时时间为2秒,表示必须2秒内做出回应,否则将视本次探测失败
             failureThreshold: 1							#连续失败次数,表示连续失败1次就定义为失败,k8s将根据pod重启机制
             												#做出是否重启容器的就决定

#下面将进入容器内部删除 /usr/share/nginx/html/index.html文件,看探针是否生效
[root@master ~]# kubectl  exec rs-nginx-http-2rng2 -it -- bash			#进入容器
root@rs-nginx-http-2rng2:/# rm -rf /usr/share/nginx/html/index.html		#删除/usr/share/nginx/html/index.html
root@rs-nginx-http-2rng2:/# exit										#退出容器
[root@master ~]# kubectl get pods										#查看pod
NAME                      READY   STATUS    RESTARTS      AGE
pod/rs-nginx-http-2rng2   1/1     Running   1 (11s ago)   74s			#发现pod的容器重启过一次
pod/rs-nginx-http-bgr4m   1/1     Running   0             74s
pod/rs-nginx-http-cv9bv   1/1     Running   0             74s
[root@master ~]# kubectl  describe  pod/rs-nginx-http-2rng2				#查看pod信息,可以看出探针探测失败,容器被重新启动了
................
    Restart Count:  1
    Liveness:       exec [cat /usr/share/nginx/html/index.html] delay=8s timeout=2s period=3s #success=1 #failure=1
................................
  Normal   Pulled     9m32s (x2 over 10m)  kubelet            Container image "nginx:1.7.9" already present on machine
  Normal   Created    9m32s (x2 over 10m)  kubelet            Created container nginx-container
  Warning  Unhealthy  9m32s                kubelet            Liveness probe failed: cat: /usr/share/nginx/html/index.html: No such file or directory
  Normal   Killing    9m32s                kubelet            Container nginx-container failed liveness probe, will be restarted
  Normal   Started    9m31s (x2 over 10m)  kubelet            Started container nginx-container
[root@master ~]# 

tcpSocket probe

The tcpSocket probe initiates a tcp Socket link according to the IP address and port of the container, and detects whether a tcp Socket link can be successfully established with the port of the container;

[root@master ~]# cat rs_nginx.yaml 
apiVersion: apps/v1
kind: ReplicaSet
metadata: 
  name: rs-nginx-http
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx-rs
    matchExpressions:
      - {
    
    key: env, operator: In, values: [dev]}
  template:
     metadata:
       labels:
         app: nginx-rs
         env: dev
     spec:
         containers:
         - image: nginx:1.7.9
           name: nginx-container
           ports:
           - containerPort: 80
           livenessProbe:								#定义存活探针
             tcpSocket: 								#探针类型为tcpSocket探针
       #       host: 10.244.0.49			    		#指定主机,默认是pod的ip,一般省略不写
               port: 80									#探测与80端口是否能成功建立tcp链接
             initialDelaySeconds: 8
             periodSeconds: 3
             timeoutSeconds: 2
             failureThreshold: 1

2. Readiness Probe

readiness probe

It is used to judge whether the container is started (ready state) and can receive requests. If the ReadinessProbe detects a failure, the state of the Pod is modified. The Endpoint Controller will delete the Endpoint containing the Pod where the container is located from the Service's Endpoint.

We know that when a pod is started, it will immediately be added to the endpoint ip list of the service and start to receive connection requests from the client. If the business process of the container in the pod has not been initialized at this time, then these client connection requests It will fail. In order to solve this problem, kubernetes provides readiness probes to solve this problem.

The container in the pod defines a readiness probe. The readiness probe checks the container periodically. If the readiness probe check fails, it means that the pod is not ready and cannot accept client connections, and the pod will be removed from the endpoint list. If the service is removed, the request will not be distributed to the pod, and then the readiness probe will continue to check, and if the container is ready, the pod will be added back to the endpoint list.

Be sure to define the readiness probe

First of all, if no readiness probe is defined, the newly created pod will be added to the endpoint list of the service immediately. If the business program in the container takes a long time to start processing the incoming link, and at this time, the service will send the client If the connection is forwarded to the pod, then the client will see "connection refused" and other types of errors, so it is necessary to define a readiness probe for each container, even if it is a simple http get probe Needle.
In the production environment, developers are generally asked to provide the health check interface of the container. There is already a native health check interface in spring boot. After developing and defining the health check interface, we use the http get method to define our probes.

Three Types of Readiness Probes

There are also three types of readiness probes, exec, httpGet and tcpSocket.

exec:执行容器中的命令并检查命令退出的状态码,如果状态码为0则说明容器已经准备就绪;
httpGet:向容器的ip,端口、路径发送http get请求,通过响应的http状态码是否位于200-400之间判断容器是否准备就绪;
tcpSocke:向容器的IP地址、端口发起tcp socket链接,如果能正常建立链接,则认为容器已经准备就绪。

Properties of the Readiness Probe

The additional properties of the readiness probe are as follows:

initialDelaySeconds:延时秒数,即容器启动多少秒后才开始探测,不写默认容器启动就探测;
periodSeconds :执行探测的频率(秒),默认为10秒,最低值为1;
timeoutSeconds :超时时间,表示探测时在超时时间内必须得到响应,负责视为本次探测失败,默认为1秒,最小值为1;
failureThreshold :连续探测失败的次数,视为本次探测失败,默认为3次,最小值为1次;
successThreshold :连续探测成功的次数,视为本次探测成功,默认为1次,最小值为1次;

Create an exec liveness probe

The exec survival probe executes the command in the container and checks the status code of the command exit. If the status code is 0, it means that the container is ready. Next, use deployment to deploy
3 pods with exec readiness probes (the service already exists). As follows:

[root@master ~]# cat deplyment_nginx.yaml 
apiVersion: apps/v1
kind: Deployment
metadata: 
  name: deployment-nginx
  namespace: default
spec:
  replicas: 3
  selector:
     matchLabels:
         app: nginx
  template:
     metadata:
       labels:
         app: nginx
     spec:
         containers:
         - image: nginx:1.7.9
           name: nginx-container
           readinessProbe:					#定义exec探针
             initialDelaySeconds: 5			#容器启动5秒后才开始探测
             periodSeconds: 10				#探针周期为10s探测一次
             timeoutSeconds: 2				#超时时间,2秒得不到响应则视为探测失败
             failureThreshold: 3			#连续探测失败3次,视为本次探测失败
             successThreshold: 1			#探测成功1次则视为本次探测成功
             exec:							#探针类型为exec
               command:						#执行的命令是ls /var/ready
               - ls
               - /var/ready
           ports:
           - name: http 
             containerPort: 80
[root@master ~]# kubectl  get pods 			#查看pod,发现没有一个是ready状态的
NAME                                READY   STATUS    RESTARTS   AGE
deployment-nginx-68bb45dd46-5rj7c   0/1     Running   0          3m1s
deployment-nginx-68bb45dd46-78ld2   0/1     Running   0          3m1s
deployment-nginx-68bb45dd46-gnnhn   0/1     Running   0          3m1
[root@master ~]# kubectl describe  pods deployment-nginx-68bb45dd46-5rj7c |tail -n 3	#查看某个pod的详细信息,显示存活探针检测失败了
  Normal   Created    5m7s                   kubelet            Created container nginx-container
  Normal   Started    5m7s                   kubelet            Started container nginx-container
  Warning  Unhealthy  2m18s (x22 over 5m6s)  kubelet            Readiness probe failed: ls: cannot access /var/ready: No such file or directory
[root@master ~]# 
[root@master ~]# kubectl  get ep svc-rc-nginx-nodeport		#查看service的endpoint列表,显示没有任何pod ip列表可用
NAME                    ENDPOINTS   AGE
svc-rc-nginx-nodeport               6h32m

# 现在,我们手动在某个pod中创建一个/var/ready文件,这样pod就会处于准备就绪状态,service就能成功添加该pod
[root@master ~]# kubectl  exec deployment-nginx-68bb45dd46-5rj7c -- touch /var/ready	#为该pod创建/var/ready文件
[root@master ~]# kubectl  get pods,ep -o wide										#查看该pod,发现pod已处于就绪状态
NAME                                    READY   STATUS    RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
pod/deployment-nginx-68bb45dd46-5rj7c   1/1     Running   0          8m52s   10.244.1.77   node1   <none>           
pod/deployment-nginx-68bb45dd46-78ld2   0/1     Running   0          8m52s   10.244.1.76   node1   <none>           
pod/deployment-nginx-68bb45dd46-gnnhn   0/1     Running   0          8m52s   10.244.2.38   node2   <none>           

NAME                              ENDPOINTS              AGE						#endpoint已经有该pod的ip端口
endpoints/svc-rc-nginx-nodeport   10.244.1.77:80         6h36m		
[root@master ~]#

Create a httpGet liveness probe

The httpGet survival probe sends an http get request to the container, and judges whether the container is ready by the response http status code;
the following uses the deployment to deploy three pods with the httpGet readiness probe, as shown below:

..........
containers:
         - image: nginx:1.7.9
           name: nginx-container
           readinessProbe:					#定义探针
              httpGet:                		# 探针类型为httpGet
       #       host: 10.244.0.49			#指定主机,默认是pod的ip,一般省略不写
               path: /read
               port: 80
...........

Create a tcpSocket liveness probe

The tcpSocket survival probe opens a tcp connection to the specified port of the container. If the connection is established, the container is considered ready.
Next, use the deployment to deploy the pod with the tcpSocket readiness probe, as follows:

..........
containers:
         - image: nginx:1.7.9
           name: nginx-container
           readinessProbe:					#定义探针
              tcpSocket:                	# 探针类型为tcpSocket
        #       host: 10.244.0.49			#指定主机,默认是pod的ip,一般省略不写
                port: 80
...........

Three, start the probe (StartupProbe)

start probe

Indicates whether the application in the container has been started. If a startup probe is provided, all other probes are disabled until it succeeds. If the startup probe fails, the kubelet will kill the container and the container will be restarted following its restart policy. If the container does not provide a start probe, the default status is Success.

startupProbe is a startup probe. The startupProbe probe is a probe added in version 1.16. It is used to determine whether the container process has started. It is to solve problems such as long program startup time and slow startup. When the startupProbe startup probe is configured, it will first Disable other probes until the startupProbe probe is successful. After success, it will exit without probing. If the startupProbe probe fails, the pod will restart.

Guess you like

Origin blog.csdn.net/fhw925464207/article/details/131771286