这篇文章以一个多容器Pod启动问题的排查为例,介绍一下多容器Pod排查时使用到的方法。
环境准备
本文使用Kubernetes 1.17.2,可参看下文进行快速环境搭建:
[root@host131 Pod]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
192.168.163.131 Ready <none> 4h3m v1.17.2 192.168.163.131 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://19.3.5
[root@host131 Pod]#
YAML准备&生成Pod
准备如下YAML并执行
[root@host131 Pod]# cat multi-pods.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: multi-pods
spec:
containers:
- name: blue-pod-container
image: nginx:latest
ports:
- containerPort: 80
- name: green-pod-container
image: nginx:latest
ports:
- containerPort: 80
...
[root@host131 Pod]#
- 创建Pod
[root@host131 Pod]# kubectl create -f multi-pods.yaml
pod/multi-pods created
[root@host131 Pod]#
- 现象:短暂容器启动为2/2之后,始终为1/2的状态
[root@host131 Pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-pods 2/2 Running 0 10s 10.254.176.3 192.168.163.131 <none> <none>
[root@host131 Pod]#
[root@host131 Pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-pods 1/2 Error 0 13s 10.254.176.3 192.168.163.131 <none> <none>
[root@host131 Pod]#
问题排查
kubectl get + kuectl describe
首先使用kubectl get + kubectl describe确认Pod基本信息
[root@host131 Pod]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
multi-pods 1/2 CrashLoopBackOff 4 2m32s 10.254.176.3 192.168.163.131 <none> <none>
[root@host131 Pod]# kubectl describe pod multi-pods
Name: multi-pods
Namespace: default
Priority: 0
Node: 192.168.163.131/192.168.163.131
Start Time: Sun, 09 Feb 2020 05:22:51 -0500
Labels: <none>
Annotations: <none>
Status: Running
IP: 10.254.176.3
IPs:
IP: 10.254.176.3
Containers:
blue-pod-container:
Container ID: docker://ec206e4846a1b6b168fcf032f97998e2e3d3a42f67430cf2987781a09130f500
Image: nginx:latest
Image ID: docker-pullable://nginx@sha256:ad5552c786f128e389a0263104ae39f3d3c7895579d45ae716f528185b36bc6f
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 09 Feb 2020 05:22:55 -0500
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kzxwt (ro)
green-pod-container:
Container ID: docker://174bdc6537867fe0fbacd1fa7a7db39841344aea6ac0ad512b88159c8d4986ef
Image: nginx:latest
Image ID: docker-pullable://nginx@sha256:ad5552c786f128e389a0263104ae39f3d3c7895579d45ae716f528185b36bc6f
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 09 Feb 2020 05:24:58 -0500
Finished: Sun, 09 Feb 2020 05:25:00 -0500
Ready: False
Restart Count: 4
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-kzxwt (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-kzxwt:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-kzxwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m44s default-scheduler Successfully assigned default/multi-pods to 192.168.163.131
Normal Pulling 2m43s kubelet, 192.168.163.131 Pulling image "nginx:latest"
Normal Pulled 2m40s kubelet, 192.168.163.131 Successfully pulled image "nginx:latest"
Normal Created 2m40s kubelet, 192.168.163.131 Created container blue-pod-container
Normal Started 2m40s kubelet, 192.168.163.131 Started container blue-pod-container
Normal Pulling 100s (x4 over 2m40s) kubelet, 192.168.163.131 Pulling image "nginx:latest"
Normal Pulled 97s (x4 over 2m37s) kubelet, 192.168.163.131 Successfully pulled image "nginx:latest"
Normal Created 97s (x4 over 2m37s) kubelet, 192.168.163.131 Created container green-pod-container
Normal Started 97s (x4 over 2m37s) kubelet, 192.168.163.131 Started container green-pod-container
Warning BackOff 80s (x5 over 2m27s) kubelet, 192.168.163.131 Back-off restarting failed container
[root@host131 Pod]#
可以看到现象为两个容器,只启动了一个,第二个无法启动,重启了多次,上述日志中显示为4次,并在不断增加。启动失败的容器名称为green-pod-container,失败原因为CrashLoopBackOff。
查看event信息
实际上kubectl get event的信息在describe信息中已经包含
5m50s Normal Scheduled pod/multi-pods Successfully assigned default/multi-pods to 192.168.163.131
5m49s Normal Pulling pod/multi-pods Pulling image "nginx:latest"
5m46s Normal Pulled pod/multi-pods Successfully pulled image "nginx:latest"
5m46s Normal Created pod/multi-pods Created container blue-pod-container
5m46s Normal Started pod/multi-pods Started container blue-pod-container
4m46s Normal Pulling pod/multi-pods Pulling image "nginx:latest"
4m43s Normal Pulled pod/multi-pods Successfully pulled image "nginx:latest"
4m43s Normal Created pod/multi-pods Created container green-pod-container
4m43s Normal Started pod/multi-pods Started container green-pod-container
47s Warning BackOff pod/multi-pods Back-off restarting failed container
查看log信息
当pod中有多个容器时,查看日志需要指定容器名称,示例命令如下所示:
执行命令:kubectl logs pod/multi-pods -c green-pod-container
执行日志示例如下所示:
[root@host131 Pod]# kubectl logs pod/multi-pods -c green-pod-container
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()
[root@host131 Pod]#
从此处找到了原因,因为在pause容器的作用下,各容器共享相同的命令空间,这其中就包含网络的命令空间,因为nginx在80端口提供服务,使用了两个nginx都在80端口提供服务,但是由于pause将其放到一个Pod中进行管理,自然会引起冲突,进入到唯一启动的nginx容器中即可确认此问题:
[root@host131 Pod]# kubectl exec -it pod/multi-pods -c blue-pod-container sh
# cd /etc/nginx/conf.d
# grep listen *
listen 80;
# proxy the PHP scripts to Apache listening on 127.0.0.1:80
# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#
总结
kubectl get/describe 以及kubectl logs都是问题排查是常用的基本操作,多容器Pod在排查时往往需要结合-c指定特定容器进行信息的确认。