Sometimes an application stops responding due to an infinite loop or deadlock. To ensure that the application can be restarted in such cases, there needs to be a mechanism for checking the health of the application rather than relying on instrumentation within the application.
K8s mainly provides three probes to target this mechanism:
-
Liveness probe : Used to check whether the container is running. If the liveness probe fails, K8s considers the container dead and will attempt to restart the container.
-
Readiness probe : Used to check whether the container is ready to receive traffic. If a container is not ready, K8s will not route traffic to that container.
-
Start Probe : Used to check if the container is started. Unlike liveness probes, startup probes run once when the container starts, rather than continuously while the container is running.
How to check the probe
-
exec : Execute the specified command in the container to determine the status code returned when the command exits, and the returned status code is 0 to indicate normal.
-
httpGet : Send a GET request to the IP address, port, and URL path of the container; if the status code of the response is between 200 and 399, it means normal.
-
tcpSocket : Check the IP address of the container and the specified port for TCP. If the port is open, the establishment of the TCP Socket is successful, indicating that it is normal.
configuration item
-
initialDelaySeconds : Wait for the time we defined before starting the probe check
-
periodSeconds : the interval time of the probe
-
timeoutSeconds : The timeout period of the probe, when it exceeds the time we defined, it will be considered a failure
-
successThreshold : The minimum number of consecutive successes for the probe
-
failureThreshold : Minimum number of consecutive failures for the probe
start probe
apiVersion: v1 # 必选 API的版本号
kind: Pod # 必选 类型Pod
metadata: # 必选 元数据
name: nginx # 必选 符合RFC 1035规范的Pod名称
#namespace: default # 可选 Pod所在的命名空间 不指定默认为default 可以使用-n指定namespace
labels: # 可选 标签选择器 一般用于过滤和区分Pod
app: nginx-ready
spec: # 必选 用于定义容器的详细信息
containers: # 必选 容器列表
- name: nginx # 必选 符合RFC 1035规范的容器名称
image: nginx:latest # 必选 容器所用的镜像的地址
imagePullPolicy: Always # 可选 镜像拉取策略 IfNotPresent:如果宿主机有这个镜像,就不用拉取了 Always:总是拉取 Never:不管存在不存在,都不拉取
ports: # 可选 容器需要暴露的端口号列表
- name: http # 端口名称
containerPort: 80 # 端口号
protocol: TCP # 端口协议 默认TCP
startupProbe: # 可选 检测容器内进程是否完成启动 注意三种检查方式同时只能使用一种
failureThreshold: 3 # 失败三次算探针失败
exec:
command: ['/bin/sh','-c','echo Hello World']
initialDelaySeconds: 3 # 容器启动完成后首次探测的时间,单位为秒
timeoutSeconds: 2 # 对容器健康检查探测等待响应的超时时间,单位秒,默认1秒
periodSeconds: 1 # 对容器监控检查的定期探测时间设置,单位秒,默认10秒一次
successThreshold: 1 # 成功1次算探针OK
failureThreshold: 3 # 失败三次算探针失败
restartPolicy: Always # 可选 默认Always 容器故障或者没有启动成功 自动重启该容器 Onfailure: 容器以不为0的状态码终止 自动重启该容器 Never:无论何种状态 都不会重启
# kubectl apply -f pod.yaml
pod/nginx created
# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 ContainerCreating 0 4s
# kubectl describe pod nginx
readiness probe
# grep -v '^#' pod.yaml
apiVersion: v1 # 必选 API的版本号
kind: Pod # 必选 类型Pod
metadata: # 必选 元数据
name: nginx # 必选 符合RFC 1035规范的Pod名称
#namespace: default # 可选 Pod所在的命名空间 不指定默认为default 可以使用-n指定namespace
labels: # 可选 标签选择器 一般用于过滤和区分Pod
app: nginx-ready
spec: # 必选 用于定义容器的详细信息
containers: # 必选 容器列表
- name: nginx # 必选 符合RFC 1035规范的容器名称
image: nginx:latest # 必选 容器所用的镜像的地址
imagePullPolicy: Always # 可选 镜像拉取策略 IfNotPresent:如果宿主机有这个镜像,就不用拉取了 Always:总是拉取 Never:不管存在不存在,都不拉取
ports: # 可选 容器需要暴露的端口号列表
- name: http # 端口名称
containerPort: 80 # 端口号
protocol: TCP # 端口协议 默认TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 3 # 容器启动完成后首次探测的时间,单位为秒
timeoutSeconds: 2 # 对容器健康检查探测等待响应的超时时间,单位秒,默认1秒
periodSeconds: 1 # 对容器监控检查的定期探测时间设置,单位秒,默认10秒一次
successThreshold: 1 # 成功1次算探针OK
failureThreshold: 3 # 失败三次算探针失败
restartPolicy: Always # 可选 默认Always 容器故障或者没有启动成功 自动重启该容器 Onfailure: 容器以不为0的状态码终止 自动重启该容器 Never:无论何种状态 都不会重启
It can be seen that the port detection is normal
Add nodeport type for traffic access verification
# grep -v '^#' pod.yaml
apiVersion: v1 # 必选 API的版本号
kind: Pod # 必选 类型Pod
metadata: # 必选 元数据
name: nginx # 必选 符合RFC 1035规范的Pod名称
#namespace: default # 可选 Pod所在的命名空间 不指定默认为default 可以使用-n指定namespace
labels: # 可选 标签选择器 一般用于过滤和区分Pod
app: nginx-ready
spec: # 必选 用于定义容器的详细信息
containers: # 必选 容器列表
- name: nginx # 必选 符合RFC 1035规范的容器名称
image: nginx:latest # 必选 容器所用的镜像的地址
imagePullPolicy: Always # 可选 镜像拉取策略 IfNotPresent:如果宿主机有这个镜像,就不用拉取了 Always:总是拉取 Never:不管存在不存在,都不拉取
ports: # 可选 容器需要暴露的端口号列表
- name: http # 端口名称
containerPort: 80 # 端口号
protocol: TCP # 端口协议 默认TCP
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 3 # 容器启动完成后首次探测的时间,单位为秒
timeoutSeconds: 2 # 对容器健康检查探测等待响应的超时时间,单位秒,默认1秒
periodSeconds: 1 # 对容器监控检查的定期探测时间设置,单位秒,默认10秒一次
successThreshold: 1 # 成功1次算探针OK
failureThreshold: 3 # 失败三次算探针失败
restartPolicy: Always # 可选 默认Always 容器故障或者没有启动成功 自动重启该容器 Onfailure: 容器以不为0的状态码终止 自动重启该容器 Never:无论何种状态 都不会重启
---
apiVersion: v1
kind: Service
metadata:
name: ready-nodeport
labels:
name: ready-nodeport
spec:
type: NodePort
ports:
- port: 88
protocol: TCP
targetPort: 80
nodePort: 30880
selector:
app: nginx-ready
access verification
# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 15s
[root@k8s-master01 ~]# kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d <none>
ready-nodeport NodePort 10.96.93.159 <none> 88:30880/TCP 20s app=nginx-ready
[root@k8s-master01 ~]# curl http://192.168.10.10:30880
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Modify the httpGet or tcpSocket port to 81 to simulate the failure of the probe detection. Will the traffic of the detection failure be allocated to enter
# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 0/1 Running 0 22s
# kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d <none>
ready-nodeport NodePort 10.96.115.11 <none> 88:30880/TCP 25s app=nginx-ready
[root@k8s-master01 ~]# curl http://192.168.10.10:30880
curl: (7) Failed connect to 192.168.10.10:30880; 拒绝连接
describe nginx view
It shows that port 81 is unavailable, the ready status is 0, but the pod is running, the request result is a connection rejection, and traffic ingress fails. If the readiness probe fails, no traffic will be injected into the pod.
survival probe
# grep -v '^#' pod.yaml
apiVersion: v1 # 必选 API的版本号
kind: Pod # 必选 类型Pod
metadata: # 必选 元数据
name: nginx # 必选 符合RFC 1035规范的Pod名称
#namespace: default # 可选 Pod所在的命名空间 不指定默认为default 可以使用-n指定namespace
labels: # 可选 标签选择器 一般用于过滤和区分Pod
app: nginx-ready
spec: # 必选 用于定义容器的详细信息
containers: # 必选 容器列表
- name: nginx # 必选 符合RFC 1035规范的容器名称
image: nginx:latest # 必选 容器所用的镜像的地址
imagePullPolicy: Always # 可选 镜像拉取策略 IfNotPresent:如果宿主机有这个镜像,就不用拉取了 Always:总是拉取 Never:不管存在不存在,都不拉取
ports: # 可选 容器需要暴露的端口号列表
- name: http # 端口名称
containerPort: 80 # 端口号
protocol: TCP # 端口协议 默认TCP
livenessProbe:
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 3 # 容器启动完成后首次探测的时间,单位为秒
timeoutSeconds: 2 # 对容器健康检查探测等待响应的超时时间,单位秒,默认1秒
periodSeconds: 1 # 对容器监控检查的定期探测时间设置,单位秒,默认10秒一次
successThreshold: 1 # 成功1次算探针OK
failureThreshold: 3 # 失败三次算探针失败
restartPolicy: Always # 可选 默认Always 容器故障或者没有启动成功 自动重启该容器 Onfailure: 容器以不为0的状态码终止 自动重启该容器 Never:无论何种状态 都不会重启
---
apiVersion: v1
kind: Service
metadata:
name: ready-nodeport
labels:
name: ready-nodeport
spec:
type: NodePort
ports:
- port: 88
protocol: TCP
targetPort: 80
nodePort: 30880
selector:
app: nginx-ready
Create pods and services and access tests
# kubectl apply -f pod.yaml
pod/nginx created
service/ready-nodeport created
[root@k8s-master01 ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 4s
# kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d <none>
ready-nodeport NodePort 10.96.54.68 <none> 88:30880/TCP 12s app=nginx-ready
[root@k8s-master01 ~]# curl 192.168.10.10:30880
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
Modify the detection port to 81, and the simulation detection fails
If the probe detection fails, it will restartPolicy
operate according to the restart policy
By default, the Always container fails or automatically restarts the container if it fails to start successfully.
recreate the pod
# kubectl get pod
NAME READY STATUS RESTARTS AGE
nginx 1/1 Running 0 4s
# kubectl get svc -owide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 72d <none>
ready-nodeport NodePort 10.96.54.68 <none> 88:30880/TCP 7m28s app=nginx-ready
# curl http://192.168.10.10:30880
curl: (7) Failed connect to 192.168.10.10:30880; 拒绝连接
Check whether the pod is restarted, you can see that the pod has been restarted 5 times, and the status has changed toCrashLoopBackOff
Follow up
Of course, in addition to probes, Kubernetes also has the following mechanisms to ensure the availability of containers:
-
RC
orReplicaSet
: Used to ensure that the specified number of Pods are running in the cluster. If Pods fail on a node or are deleted, the RC or ReplicaSet will start new Pods to take their place. -
Deployment : Used to manage updates of Pods and ReplicaSets. Deployments can ensure the availability of applications while doing rolling updates and provide a rollback mechanism to revert to a previous version.
-
Service : Used to route traffic to running Pods. Service provides a stable IP address and DNS name for the application, and can load balance traffic to multiple Pods as needed.
-
Namespace : Used to isolate and organize resources in the cluster. By using Namespaces, you can isolate different applications or teams and control the resources they can access.