Master KUBERNETES POD Troubleshooting: Advanced Strategies and Scenarios

Kubernetes (K8s) deployments often present challenges from a variety of perspectives, including pods, services, ingress, unresponsive clusters, control planes, and high-availability setup. Kubernetes pod is the smallest deployable unit in the Kubernetes ecosystem, encapsulating one or more containers that share resources and networks. Pods are designed to run a single instance of an application or process, and are created and disposed of as needed. Pods are essential for scaling, updating, and maintaining applications in a K8s environment.

Translated from Master Kubernetes Pods: Advanced Troubleshooting Strategies , author None.

This article explores the challenges faced by Kubernetes pods and the troubleshooting steps to take. Some of the error messages encountered when running Kubernetes pods include:

  • ImagePullBackoff
  • ErrImagePull
  • InvalidImageName
  • CrashLoopBackOff

Sometimes, you won't even encounter the errors listed, but still find that your pod fails. First, it's important to note that when debugging any Kubernetes resource, you should understand the API reference . It explains how the various Kubernetes APIs are defined and how multiple objects in a pod/deployment work. The documentation is clearly defined in the API reference on the Kubernetes website . In this case, when debugging the pod, select the pod object from the API reference to learn more about how the pod works. It defines the fields that go into the pod, namely version, type, metadata, specification, and status. Kubernetes also provides a cheat sheet with a guide to the required commands.

prerequisites

This article assumes that readers have the following conditions:

  • Kind installed for scenario demonstration
  • Intermediate understanding of Kubernetes architecture
  • Kubectl command line tool

Kubernetes Pod Error - ImagePullBackoff

This error appears for three different reasons:

  • Invalid image
  • Invalid tag
  • Invalid permissions

These situations arise when you don't have the correct information about the image. You may also not have permission to pull the image from its repository (private repository). To demonstrate this in the example below, we create an nginx deployment:

➜ ~ kubectl create deploy nginx --image=nginxdeployment.apps/nginx created

After the Pod is running, get the pod name:

➜ ~ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-8f458dc5b-hcrsh 1/1 Running 0 100s

Copy the name of the running pod and get more information about it:

➜ ~ kubectl describe pod nginx-8f458dc5b-hcrsh
Name:             nginx-8f458dc5b-hcrsh
hable:NoExecute op=Exists for 300s
Events:
 Type    Reason     Age    From               Message
 ----    ------     ----   ----               -------
 Normal  Scheduled  2m43s  default-scheduler  Successfully assigned default/nginx-8f458dc5b-hcrsh to k8s-troubleshooting-control-plane
 Normal  Pulling    2m43s  kubelet            Pulling image "nginx"
 Normal  Pulled     100s   kubelet            Successfully pulled image "nginx" in 1m2.220189835s
 Normal  Created    100s   kubelet            Created container nginx
 Normal  Started    100s   kubelet            Started container nginx

The image has been successfully pulled. Your Kubernetes pod is running without errors.

To demonstrate ImagePullBackoff, edit the deployment YAML file and specify a non-existent image:

➜ kubectl edit deploy nginx
 containers:
 -image: nginxdoestexist
  imagePullPolicy: Always
  name: nginx

The new pod was not successfully deployed

➜ ~ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5b847fdb95-mx4pq 0/1 ErrImagePull 0 3m40s
nginx-8f458dc5b-hcrsh 1/1 Running 0 38m

ImagePullBackoff error displayed

➜  ~ kubectl describe pod nginx-6f46cbfbcb-c92bl
Events:
 Type     Reason     Age                From               Message
 ----     ------     ----               ----               -------
 Normal   Scheduled  88s                default-scheduler  Successfully assigned default/nginx-6f46cbfbcb-c92bl to k8s-troubleshooting-control-plane
 Normal   Pulling    40s (x3 over 88s)  kubelet            Pulling image "nginxdoesntexist"
 Warning  Failed     37s (x3 over 85s)  kubelet            Failed to pull image "nginxdoesntexist": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginxdoesntexist:latest": failed to resolve reference "docker.io/library/nginxdoesntexist:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
 Warning  Failed     37s (x3 over 85s)  kubelet            Error: ErrImagePull
 Normal   BackOff    11s (x4 over 85s)  kubelet            Back-off pulling image "nginxdoesntexist"
 Warning  Failed     11s (x4 over 85s)  kubelet            Error: ImagePullBackOff

Kubernetes Pod Error - The image has been pulled but the Pod is in pending status.

Whenever you run K8s in a production environment, the K8s administrator allocates resource quotas to each namespace based on the requirements of the namespaces running within the cluster. Namespaces are used for logical separation within a cluster.

The "Image pulled, but the pod is still pending" error is thrown when the specifications in the resource quota do not meet the minimum requirements of the application in the Pod. In the following example, create a namespace called payments:

➜ ~ kubectl create ns payments

namespace/payments created

Create resource quotas using relevant specifications

➜  ~ cat resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
 name: compute-resources
spec:
 hard:
   requests.cpu: "1"
   requests.memory: 1Gi
   limits.cpu: "2"
   limits.memory: 4Gi

Assign resource quotas to namespace payments

➜ ~ kubectl apply -f resourcequota.yaml -n paymentsresourcequota/compute-resources created

Resource quota/compute-resources created

Create a new deployment within a namespace with resource quota restrictions:

kubectl create deploy nginx --image=nginx -n paymentsdeployment.apps/nginx created

Although the deployment was created successfully, no Pods exist:

➜ ~ kubectl get pods -n payments

No resources found in payments namespace

The deployment is created, but there are no Pods in the ready state, no Pods to update, and no Pods available:

➜  ~ kubectl get deploy -n payments
NAME    READY   UP-TO-DATE   AVAILABLE   AGE
nginx   0/1     0            0           7m4s

To debug further, describe the nginx deployment. Pod creation failed:

➜  ~ kubectl describe deploy nginx -n payments
Name:                   nginx
Namespace:              payments
CreationTimestamp:      Wed, 24 May 2023 21:37:55 +0300
Labels:                 app=nginx
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app=nginx
Replicas:               1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
 Labels:  app=nginx
 Containers:
  nginx:
   Image:        nginx
   Port:         <none>
   Host Port:    <none>
   Environment:  <none>
   Mounts:       <none>
 Volumes:        <none>
Conditions:
 Type             Status  Reason
 ----             ------  ------
 Available        False   MinimumReplicasUnavailable
 ReplicaFailure   True    FailedCreate
 Progressing      False   ProgressDeadlineExceeded
OldReplicaSets:    <none>
NewReplicaSet:     nginx-8f458dc5b (0/1 replicas created)
Events:
 Type    Reason             Age   From                   Message
 ----    ------             ----  ----                   -------
 Normal  ScalingReplicaSet  10m   deployment-controller  Scaled up replica set nginx-8f458dc5b to 1

Further analysis from Kubernetes events revealed insufficient memory required for Pod creation.

➜ ~ kubectl get events --sort-by=/metadata.creationTimestamp

This error occurs when your image is successfully pulled and your container is created, but your runtime configuration fails. For example, if you have a working Python application that is trying to write to a folder that does not exist or does not have permission to write to the folder. Initially, the application executes and then encounters an error. If a panic occurs in your application logic, the container will stop. The container will go into CrashLoopBackOff. Eventually, you observe that the deployment has no Pods, i.e. there is a Pod, but it is not running and throws a CrashLoopbackoff error.

Liveness and readiness probes failed

Liveness detection detects whether a Pod has entered a damaged state and can no longer provide traffic. Kubernetes will restart the Pod for you. Readiness probes check whether your application is ready to handle traffic. Readiness probes ensure that your application fetches all required configuration from the configuration map and starts its threads. Only after completing this process will your application be ready to receive traffic. If your application encounters an error during this process, it will also enter CrashLoopBackoff.

Start troubleshooting!

This article provides an overview of troubleshooting techniques for Kubernetes Pods. It addresses common errors encountered when deploying Pods and provides practical solutions for resolving these errors. It also provides insight into reference pages and cheat sheets that are critical when understanding how Kubernetes works and effectively identifying and resolving issues. By following the guidance provided in this article, readers can improve their troubleshooting skills and simplify the deployment and management of their Kubernetes Pods.

This article was first published on Yunyunzhongsheng ( https://yylives.cc/ ), everyone is welcome to visit.

I decided to give up on open source Hongmeng. Wang Chenglu, the father of open source Hongmeng: Open source Hongmeng is the only architectural innovation industrial software event in the field of basic software in China - OGG 1.0 is released, Huawei contributes all source code Google Reader is killed by the "code shit mountain" Ubuntu 24.04 LTS is officially released Before the official release of Fedora Linux 40, Microsoft developers: Windows 11 performance is "ridiculously bad", Ma Huateng and Zhou Hongyi shake hands, "eliminating grudges" Well-known game companies have issued new regulations: employee wedding gifts must not exceed 100,000 yuan Pinduoduo was sentenced for unfair competition Compensation of 5 million yuan
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/6919515/blog/11054466