这篇文章以一个多容器Pod启动问题的排查为例，介绍一下多容器Pod排查时使用到的方法。

环境准备

本文使用Kubernetes 1.17.2，可参看下文进行快速环境搭建：

单机版本或者集群版本环境搭建

[root@host131 Pod]# kubectl get nodes -o wide
NAME              STATUS   ROLES    AGE    VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
192.168.163.131   Ready    <none>   4h3m   v1.17.2   192.168.163.131   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://19.3.5
[root@host131 Pod]#

YAML准备&生成Pod

准备如下YAML并执行

[root@host131 Pod]# cat multi-pods.yaml 
---
apiVersion: v1
kind: Pod
metadata:
  name: multi-pods
spec:
  containers:
  - name: blue-pod-container
    image: nginx:latest
    ports:
    - containerPort: 80
  - name: green-pod-container
    image: nginx:latest
    ports:
    - containerPort: 80
...
[root@host131 Pod]#

创建Pod

[root@host131 Pod]# kubectl create -f multi-pods.yaml 
pod/multi-pods created
[root@host131 Pod]#

现象：短暂容器启动为2/2之后，始终为1/2的状态

[root@host131 Pod]# kubectl get pods -o wide
NAME         READY   STATUS    RESTARTS   AGE   IP             NODE              NOMINATED NODE   READINESS GATES
multi-pods   2/2     Running   0          10s   10.254.176.3   192.168.163.131   <none>           <none>
[root@host131 Pod]# 
[root@host131 Pod]# kubectl get pods -o wide
NAME         READY   STATUS   RESTARTS   AGE   IP             NODE              NOMINATED NODE   READINESS GATES
multi-pods   1/2     Error    0          13s   10.254.176.3   192.168.163.131   <none>           <none>
[root@host131 Pod]#

问题排查

kubectl get + kuectl describe

首先使用kubectl get + kubectl describe确认Pod基本信息

[root@host131 Pod]# kubectl get pods -o wide
NAME         READY   STATUS             RESTARTS   AGE     IP             NODE              NOMINATED NODE   READINESS GATES
multi-pods   1/2     CrashLoopBackOff   4          2m32s   10.254.176.3   192.168.163.131   <none>           <none>
[root@host131 Pod]# kubectl describe pod multi-pods
Name:         multi-pods
Namespace:    default
Priority:     0
Node:         192.168.163.131/192.168.163.131
Start Time:   Sun, 09 Feb 2020 05:22:51 -0500
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.254.176.3
IPs:
  IP:  10.254.176.3
Containers:
  blue-pod-container:
    Container ID:   docker://ec206e4846a1b6b168fcf032f97998e2e3d3a42f67430cf2987781a09130f500
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:ad5552c786f128e389a0263104ae39f3d3c7895579d45ae716f528185b36bc6f
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 09 Feb 2020 05:22:55 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kzxwt (ro)
  green-pod-container:
    Container ID:   docker://174bdc6537867fe0fbacd1fa7a7db39841344aea6ac0ad512b88159c8d4986ef
    Image:          nginx:latest
    Image ID:       docker-pullable://nginx@sha256:ad5552c786f128e389a0263104ae39f3d3c7895579d45ae716f528185b36bc6f
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 09 Feb 2020 05:24:58 -0500
      Finished:     Sun, 09 Feb 2020 05:25:00 -0500
    Ready:          False
    Restart Count:  4
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-kzxwt (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-kzxwt:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-kzxwt
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From                      Message
  ----     ------     ----                  ----                      -------
  Normal   Scheduled  2m44s                 default-scheduler         Successfully assigned default/multi-pods to 192.168.163.131
  Normal   Pulling    2m43s                 kubelet, 192.168.163.131  Pulling image "nginx:latest"
  Normal   Pulled     2m40s                 kubelet, 192.168.163.131  Successfully pulled image "nginx:latest"
  Normal   Created    2m40s                 kubelet, 192.168.163.131  Created container blue-pod-container
  Normal   Started    2m40s                 kubelet, 192.168.163.131  Started container blue-pod-container
  Normal   Pulling    100s (x4 over 2m40s)  kubelet, 192.168.163.131  Pulling image "nginx:latest"
  Normal   Pulled     97s (x4 over 2m37s)   kubelet, 192.168.163.131  Successfully pulled image "nginx:latest"
  Normal   Created    97s (x4 over 2m37s)   kubelet, 192.168.163.131  Created container green-pod-container
  Normal   Started    97s (x4 over 2m37s)   kubelet, 192.168.163.131  Started container green-pod-container
  Warning  BackOff    80s (x5 over 2m27s)   kubelet, 192.168.163.131  Back-off restarting failed container
[root@host131 Pod]#

可以看到现象为两个容器，只启动了一个，第二个无法启动，重启了多次，上述日志中显示为4次，并在不断增加。启动失败的容器名称为green-pod-container，失败原因为CrashLoopBackOff。

查看event信息

实际上kubectl get event的信息在describe信息中已经包含

5m50s       Normal    Scheduled   pod/multi-pods   Successfully assigned default/multi-pods to 192.168.163.131
5m49s       Normal    Pulling     pod/multi-pods   Pulling image "nginx:latest"
5m46s       Normal    Pulled      pod/multi-pods   Successfully pulled image "nginx:latest"
5m46s       Normal    Created     pod/multi-pods   Created container blue-pod-container
5m46s       Normal    Started     pod/multi-pods   Started container blue-pod-container
4m46s       Normal    Pulling     pod/multi-pods   Pulling image "nginx:latest"
4m43s       Normal    Pulled      pod/multi-pods   Successfully pulled image "nginx:latest"
4m43s       Normal    Created     pod/multi-pods   Created container green-pod-container
4m43s       Normal    Started     pod/multi-pods   Started container green-pod-container
47s         Warning   BackOff     pod/multi-pods   Back-off restarting failed container

查看log信息

当pod中有多个容器时，查看日志需要指定容器名称，示例命令如下所示：

执行命令：kubectl logs pod/multi-pods -c green-pod-container

执行日志示例如下所示：

[root@host131 Pod]# kubectl logs pod/multi-pods -c green-pod-container
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2020/02/09 10:29:18 [emerg] 1#1: still could not bind()
nginx: [emerg] still could not bind()
[root@host131 Pod]#

从此处找到了原因，因为在pause容器的作用下，各容器共享相同的命令空间，这其中就包含网络的命令空间，因为nginx在80端口提供服务，使用了两个nginx都在80端口提供服务，但是由于pause将其放到一个Pod中进行管理，自然会引起冲突，进入到唯一启动的nginx容器中即可确认此问题：

[root@host131 Pod]# kubectl exec -it pod/multi-pods -c blue-pod-container sh
# cd /etc/nginx/conf.d
# grep listen *
    listen       80;
    # proxy the PHP scripts to Apache listening on 127.0.0.1:80
    # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
#

总结

kubectl get/describe 以及kubectl logs都是问题排查是常用的基本操作，多容器Pod在排查时往往需要结合-c指定特定容器进行信息的确认。

liumiaocn 博客专家

发布了1084 篇原创文章 · 获赞 1299 · 访问量 402万+

他的留言板关注

Kubernetes基础：多容器Pod问题排查方法