Kubernetes Pod fault classification and troubleshooting methods

Pod concept

  • Pod is the basic unit of the smallest cluster kubernetes deployment and management, collaboration addressing cooperative scheduling.
  • Pod is a set of one or more containers, one or a set of services (process) a set of abstract.
  • Pod storage and can be shared network (a virtual machine can be simply understood as a logical, but not the virtual machine).
  • Pod after being created using a UID uniquely identified, when the end of the life cycle Pod, Pod is an equivalent alternative, UID regenerated.

DockerIs the Kubernetes Podmost commonly used container is running, but Podcan also support other container operation, such as rkt, podman and so on.

Pod Kubernetes cluster may be used for two main purposes:

  • 运行单个容器的 Pod. "Each Pod a container" is the most common models used in Example Kubernetes; in this case, can be considered as a single container Pod wrapper, and Pod Kubernetes direct management, rather than the container.
  • 运行多个协同工作的容器的 Pod. Pod may be packaged tightly coupled by a plurality of resources and the need to share the container coexistence composition applications. These co-located service unit containers may be formed a single cohesive, the container file is supplied from a shared volume to the public, and the other of a separate "Trailers" container or the refresh update these files. Pod storage resources and these containers packaged as a managed entity.

Pod Controller

The controller can create and manage multiple Pod, copy and manage your on line, and provide self-healing capabilities within a cluster. For example, if a node fails, the controller may automatically replace the same schedule Pod avatars on different nodes.

Comprising one or more Pod Some examples controller comprising:

  • Deployment kubernetes most commonly used controller application for operating stateless
  • StatefulSet For running stateful applications
  • DaemonSet Act like a daemon on your computer, it can run clustered storage, log collection and monitoring "daemon"

The controller is usually created Pod Pod it is responsible for the use of templates that you provide.

Pod fault classification

  • Pending state has been in a Pod
  • Pod has been in a state of Waiting
  • Pod been in a state ContainerCreating
  • Pod status in ImagePullBackOff
  • Pod status in CrashLoopBackOff
  • Pod status in Error
  • Pod been in a state Terminating
  • Pod state is Unknown

The above is a summary of the individual, if not all, please forgive me!

Pod troubleshooting command

  • kubectl get pod <pod-name> -o yaml # View Pod is configured correctly
  • kubectl describe pod <pod-name> Pod # View detailed event information
  • kubectl logs <pod-name> [-c <container-name>] # View container logs

Pod failure problems and troubleshooting methods

  • Pod has been in a Pendingstate

    Pending state, which means that, Pod of YAML documents have been submitted to the Kubernetes, API object has been created and saved in Etcd them. However, the Pod in some container for some reason can not be successfully created. For example, scheduling unsuccessful (can kubectl describe podview the current Pod incident command, and then determine why there is no schedule). Possible causes: insufficient resources (within the cluster are not met all Node CPU, memory, GPU and other resources of the Pod request); HostPort is occupied (generally recommended Service open service port).

  • Pod has been at Waitingor ContainerCreatingstate

    First is by kubectl describe podviewing the current command Podof the event. Possible causes include:

    1 镜像拉取失败, for example, the mirror address configuration errors can not take foreign pulling mirror source (gcr.io), the private key configuration error mirror, the mirror is too large to pull out (can be appropriately adjusted kubelet of --image-pull-progress- deadline and --runtime-request-timeout option) and so on.

    2, CNI network error, generally you need to check the configuration CNI network plug-in, such as: Pod can not configure the network, you can not assign IP addresses.

    3, the container can not start, you need to check that the correct package is configured correctly or mirroring the container parameters.

    4, Failed create pod sandbox, see kubelet log, probably due to disk bad sectors (input / output error).

  • Pod has been in a ImagePullBackOffstate

    Usually 镜像名称配置错误or 私有镜像的密钥配置错误导致. This situation can be used docker pullto verify that the mirror can be properly pulled.

    If the private key image is not configured or misconfigured, check as follows:

    1, type of query docker-registry Secret

    # 查看 docker-registry Secret 
    $ kubectl  get secrets my-secret -o yaml | grep 'dockerconfigjson:' | awk '{print $NF}' | base64 -d
    

    2. Create a docker-registry types Secret

    # 首先创建一个 docker-registry 类型的 Secret
    $ kubectl create secret docker-registry my-secret --docker-server=DOCKER_REGISTRY_SERVER --docker-username=DOCKER_USER --docker-password=DOCKER_PASSWORD --docker-email=DOCKER_EMAIL
    
    # 然后在 Deployment 中引用这个 Secret
    spec:
      containers:
      - name: private-reg-container
        image: <your-private-image>
      imagePullSecrets:
      - name: my-secret
    
  • Pod has been in a CrashLoopBackOffstate

    CrashLoopBackOffStatus Description container has been started, but quit unexpectedly. At this point you can first look at the log container.

    Command kubectl logsand kubectl logs --previouscan be found in some containers reason to exit, such as: container process exits, exit the health check fails, then if not found clues to the container can also execute commands to check the exit reason (kubectl exec cassandra - cat / var /log/cassandra/system.log), if you still have no clue, it would need to SSH to log on Node where the Pod, view Kubeletor Dockerlog further investigation.

  • Pod in Errorstate

    Usually in Error Status Description Pod error occurred during startup. Common causes include: dependent ConfigMap, Secretor PV, which do not exist; the requested resource exceeds the limit set by the administrator, such as more than LimitRangeetc; violation of a cluster of security policy, such as a violation PodSecurityPolicysuch as; container resources have no right to operate within the clusters, such as open RBAC, you need to ServiceAccountconfigure the binding character;

  • Terminating or Unknown state in Pod

    V1.5 from the start, Kubernetes Node will not be lost to delete Pod and on its running, but will be marked as Terminatingor Unknownstate. Pod want to delete these states there are three ways:

    1. Delete the Node from the cluster. When using a public cloud, kube-controller-manager will automatically delete the corresponding Node in the VM deleted. In the physical machine deployment cluster, an administrator needs to manually delete Node (such as kubectl delete node).

    2, Node back to normal. Kubelet will reconfirm the status of these expectations with kube-apiserver Pod of communication, which can then decide to delete or continue to run the Pod. Users forced to delete. Users can perform kubectl delete pods pod-name --grace-period=0 --forceforcibly remove Pod. Unless explicitly know Pod indeed in a stopped state (such as Node where the VM or physical machine has been turned off), it is not advisable to use this method. Especially the StatefulSetmanagement of Pod, force the removal of easily lead to data loss or split-brain problems.

    3, Pod abnormal behavior, abnormal behavior mentioned here refers to the Pod does not perform as expected behavior, such as not running the podSpeccommand-line arguments set inside. This is generally podSpec yaml file contents error, you can try using --validateparameters reconstruction containers, such as:

    kubectl delete pod mypodAnd kubectl create --validate -f mypod.yaml, you can also see if podSpec post is created, such as: kubectl get pod mypod -o yamlmodify the static Pod Manifestafter not automatically rebuild, Kubelet use mechanisms to detect inotify / etc / kubernetes / manifests directory (by Kubelet of --pod-manifest-path option is specified) changes in static Pod and re-create the appropriate Pod after the file changes. But sometimes it does not automatically create a new scenario modified Manifest Pod Pod will still take place, this time a simple fix is to restart Kubelet.

    Unknown This is an abnormal state, means that the status Pod can not continue to be reported to the kubelet kube-apiserver, it is likely that there is a problem of communication between the master nodes (Master and Kubelet) from.

Reference links

  • https://kubernetes.io/zh/docs/concepts/workloads/pods/
  • https://kubernetes.io/docs/tasks/debug-application-cluster/debug-application/
  • https://www.huweihuang.com/kubernetes-notes/concepts/pod/pod.html
  • https://blog.csdn.net/fanren224/article/details/86318921
  • https://zhuanlan.zhihu.com/p/34332367

Your concern is the power station


欢迎大家关注交流,定期分享自动化运维、DevOps、Kubernetes、Service Mesh和Cloud Native

Published 31 original articles · won praise 11 · views 1408

Guess you like

Origin blog.csdn.net/qq_24794401/article/details/103827133