Kubernetes Application Troubleshooting—Debugging Service

For newly installed Kubernetes, a common problem is that the Service cannot run normally. You've run the Pod through the Deployment (or other workload controller), and created the Service, but when you try to access it, nothing happens. Hopefully this document will help you and find out what went wrong.

Run the command in the pod

For many of the steps here, you'll probably want to know what a Pod running in your cluster looks like. The easiest way is to run an interactive busybox pod:

kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox sh

Instructions:  If you don't see a command prompt, press Enter.

If you already have a running pod that you want to use, you can run the following command to access it:

kubectl exec <POD-NAME> -c <CONTAINER-NAME> -- <COMMAND>

set up

In order to complete the tasks of this practice, we first run a few Pods. Since you may be debugging your own Service, you can substitute your own information, or you can follow the tutorial and start the steps below to get the second data point.

kubectl create deployment hostnames --image=registry.k8s.io/serve_hostname
deployment.apps/hostnames created

kubectl The command will print the type and name of the resource created or changed, which can be used in subsequent commands. Let's scale up the number of replicas of this deployment to 3.

kubectl scale deployment hostnames --replicas=3
deployment.apps/hostnames scaled

Note that this is similar to how you start a Deployment with the following YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: hostnames
  name: hostnames
spec:
  selector:
    matchLabels:
      app: hostnames
  replicas: 3
  template:
    metadata:
      labels:
        app: hostnames
    spec:
      containers:
      - name: hostnames
        image: registry.k8s.io/serve_hostname

The "app" label is  kubectl create deployment automatically set based on the Deployment name.

Make sure your Pod is running:

kubectl get pods -l app=hostnames
NAME                        READY     STATUS    RESTARTS   AGE
hostnames-632524106-bbpiw   1/1       Running   0          2m
hostnames-632524106-ly40y   1/1       Running   0          2m
hostnames-632524106-tlaok   1/1       Running   0          2m

You can also confirm that your pod is serving. You can get a list of Pod IP addresses and test them directly.

kubectl get pods -l app=hostnames \
    -o go-template='{
   
   {range .items}}{
   
   {.status.podIP}}{
   
   {"\n"}}{
   
   {end}}'
10.244.0.5
10.244.0.6
10.244.0.7

The example container used for this tutorial serves its own hostname over HTTP on port 9376, but if you want to debug your own application, you'll need to use the port number your Pod is listening on.

Run inside the Pod:

for ep in 10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376; do
    wget -qO- $ep
done

The output looks like this:

hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok

If you don't get the expected response at this point, your pod may not be in a healthy state, or it may not be listening on the port you think it is. You may find  kubectl logs the command useful to see what's going on, or you may need to kubectl exec debug by going directly into the pod and from there.

Assuming everything has gone according to plan so far, you can start investigating why your Service isn't working as it should.

Does Service exist?

Careful readers will notice that we haven't actually created a Service -- this is by design. This step is sometimes forgotten, it's the first step to check.

So what happens if I try to access a Service that doesn't exist? Assuming you have another Pod that matches the Service by name, you will get something like:

wget -O- hostnames
Resolving hostnames (hostnames)... failed: Name or service not known.
wget: unable to resolve host address 'hostnames'

The first thing to check is whether the Service actually exists:

kubectl get svc hostnames
No resources found.
Error from server (NotFound): services "hostnames" not found

Let's create Service. As before, in this practice -- you can use the content of your own Service here.

kubectl expose deployment hostnames --port=80 --target-port=9376
service/hostnames exposed

Rerun the query command:

kubectl get svc hostnames
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
hostnames   ClusterIP   10.0.1.175   <none>        80/TCP    5s

Now you know that Service does exist.

As before, this step has the same effect as starting the Service through YAML:

apiVersion: v1
kind: Service
metadata:
  labels:
    app: hostnames
  name: hostnames
spec:
  selector:
    app: hostnames
  ports:
  - name: default
    protocol: TCP
    port: 80
    targetPort: 9376

To highlight the completeness of the configuration scope, the Services you create here use a different port number than the Pods. These values ​​can be the same for many real Services.

Are there network policy inbound rules that affect the target Pod?

If you have deployed any  hostnames-* network policy inbound rules that could affect incoming traffic to your pods, you need to check them.

Is the Service reachable by DNS name?

Usually the client is matched to the Service by the DNS name.

Run the following command from a Pod under the same namespace:

nslookup hostnames
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      hostnames
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

If that fails, then your Pod and Service may be in different namespaces, try using a namespace-qualified name (also run inside a Pod):

nslookup hostnames.default
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      hostnames.default
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

If successful, then you need to adjust your application to use a cross-namespace name to access it, or run the application and Service in the same namespace. If that still fails, try a fully qualified name:

nslookup hostnames.default.svc.cluster.local
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      hostnames.default.svc.cluster.local
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local

Note the suffix here: "default.svc.cluster.local". "default" is the namespace we are operating in. "svc" indicates that this is a Service. "cluster.local" is your cluster domain, it may be different in your own cluster.

You can also try this on a node in the cluster:

illustrate:

10.0.0.10 is the DNS service IP of the cluster, yours may be different.

nslookup hostnames.default.svc.cluster.local 10.0.0.10
Server:         10.0.0.10
Address:        10.0.0.10#53

Name:   hostnames.default.svc.cluster.local
Address: 10.0.1.175

/etc/resolv.conf If you are able to look up using fully qualified names but not relative names, you need to check that you have the correct files in your Pod  . Run the following command in the Pod:

cat /etc/resolv.conf

You should see output similar to this:

nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5

nameserver line must indicate your cluster's DNS Service, which is  --cluster-dns passed to the kubelet via a flag.

search Lines must contain an appropriate suffix in order to look up the Service name. In this case, it looks for services in the local namespace ( ), services in default.svc.cluster.localall namespaces ( ), and finally the name of the service in the cluster ( ). Depending on your own installation, there may be additional records (up to 6). The cluster suffix is ​​passed to via   flag   . In this article, we assume the suffix is ​​"cluster.local". Your cluster configuration may be different, in which case you should change it in all commands above.svc.cluster.localcluster.local--cluster-domainkubelet

options line must be set high enough  ndotsfor the DNS client library to consider the search path. By default, Kubernetes sets this value to 5, which is high enough to override all DNS names it generates.

Is there a Service accessible by DNS name?

If the above method still fails, DNS can't find the Service you need, you can take a step back and see what else is not working properly. The Kubernetes main Service should always be working. Run the following command in the Pod:

nslookup kubernetes.default
Server:    10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local

If that fails, you may need to go to the kube-proxy section of this document, or even go back to the top of the document and start over, but instead of debugging your own Service, debug the DNS Service.

Can Service be accessed via IP?

Assuming you have confirmed that DNS is working properly, the next thing to test is whether your Service can be accessed normally through its IP. From a Pod in the cluster, try to access the Service's IP (  kubectl get obtained from the command above).

for i in $(seq 1 3); do 
    wget -qO- 10.0.1.175:80
done

The output should look something like this:

hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok

If the Service status is OK, you should get the correct response. If not, there are many things that could go wrong, so read on.

Is the Service configured correctly?

It might sound stupid, but you should double or even triple check that your Service is configured correctly and matches your Pod. Look at your Service configuration and verify it:

kubectl get service hostnames -o json
{
    "kind": "Service",
    "apiVersion": "v1",
    "metadata": {
        "name": "hostnames",
        "namespace": "default",
        "uid": "428c8b6c-24bc-11e5-936d-42010af0a9bc",
        "resourceVersion": "347189",
        "creationTimestamp": "2015-07-07T15:24:29Z",
        "labels": {
            "app": "hostnames"
        }
    },
    "spec": {
        "ports": [
            {
                "name": "default",
                "protocol": "TCP",
                "port": 80,
                "targetPort": 9376,
                "nodePort": 0
            }
        ],
        "selector": {
            "app": "hostnames"
        },
        "clusterIP": "10.0.1.175",
        "type": "ClusterIP",
        "sessionAffinity": "None"
    },
    "status": {
        "loadBalancer": {}
    }
}
  • Is the Service port you want to access  spec.ports[] listed in ?
  • targetPort Is this correct for your Pods (many Pods use a different port than the Service)?
  • If you want to use a numeric port, is its type a number (9376) or the string "9376"?
  • If you want to use a named port, does your Pod expose a port with the same name?
  • Does the port  protocol correspond to the Pod?

Does Service have Endpoints?

If you've gotten this far, you've confirmed that your Service is properly defined and resolvable via DNS. Now, let's check that the Pod you're running is indeed picked up by the Service.

Earlier, we have seen that the pod is running. We can check again:

kubectl get pods -l app=hostnames
NAME                        READY     STATUS    RESTARTS   AGE
hostnames-632524106-bbpiw   1/1       Running   0          1h
hostnames-632524106-ly40y   1/1       Running   0          1h
hostnames-632524106-tlaok   1/1       Running   0          1h

-l app=hostnames The parameter is a label selector configured on the Service.

The "AGE" column shows that these pods have been up for an hour, which means they are running fine without crashing.

The "RESTARTS" column indicates that the Pod is not crashing or restarting frequently. Frequent crashes can cause intermittent connection issues. If the number of restarts is too large, learn about related technologies by debugging Pods.

There is a control loop in the Kubernetes system that evaluates the selectors for each Service and saves the result into an Endpoints object.

kubectl get endpoints hostnames
NAME        ENDPOINTS
hostnames   10.244.0.5:9376,10.244.0.6:9376,10.244.0.7:9376

This confirms that the Endpoints controller has found the correct Pods for your Service. If  ENDPOINTS the column has a value of  <none>, you should check the field for the Service  spec.selector , and the value for the Pod you actually want to select  metadata.labels . A common mistake is a typo or other error, such as Service wants to choose  app=hostnames, but Deployment specifies it  run=hostnames. kubectl run It can also be used to create Deployments in versions prior to 1.18  .

Are the pods working fine?

At this point, you know your Service exists and has been matched to your Pod. At the beginning of this lab, you checked out the Pod itself. Let's check again that the Pod is actually working - you can bypass the Service mechanism and go directly to the Pod, as shown in the Endpoints above.

illustrate:

These commands use the Pod port (9376), not the Service port (80).

Run in the Pod:

for ep in 10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376; do
    wget -qO- $ep
done

The output should look something like this:

hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok

You want each Pod in the Endpoint list to return its own hostname. If this is not the case (or what is the correct behavior for your own pods), you should investigate what happened.

Is kube-proxy working properly?

If you get here, your Service is running, has Endpoints, and Pods are actually providing services. At this point, the entire Service proxy mechanism is suspect. Let's go step by step to confirm it's ok.

The default implementation of Service (as used on most clusters) is kube-proxy. This is a program that runs on each node and is responsible for configuring one of the mechanisms used to provide the Service abstraction. If your cluster does not use kube-proxy, the following sections will not apply and you will have to check the implementation of the Service you are using.

Is kube-proxy running properly? {#is-kube-proxy working}

Confirm  kube-proxy that it is running on the node. Running directly on the node, you will get output similar to the following:

ps auxw | grep kube-proxy
root  4194  0.4  0.1 101864 17696 ?    Sl Jul04  25:43 /usr/local/bin/kube-proxy --master=https://kubernetes-master --kubeconfig=/var/lib/kube-proxy/kubeconfig --v=2

Next, make sure it doesn't fail visibly, such as failing to connect to the master node. To do this, you have to look at the logs. The way to access the logs depends on your node's operating system. On some operating systems the log is a file such as /var/log/messages kube-proxy.log, while other operating systems use  journalctl access logs. You should see output similar to:

I1027 22:14:53.995134    5063 server.go:200] Running in resource-only container "/kube-proxy"
I1027 22:14:53.998163    5063 server.go:247] Using iptables Proxier.
I1027 22:14:54.038140    5063 proxier.go:352] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [10.244.1.3:53]
I1027 22:14:54.038164    5063 proxier.go:352] Setting endpoints for "kube-system/kube-dns:dns" to [10.244.1.3:53]
I1027 22:14:54.038209    5063 proxier.go:352] Setting endpoints for "default/kubernetes:https" to [10.240.0.2:443]
I1027 22:14:54.038238    5063 proxier.go:429] Not syncing iptables until Services and Endpoints have been received from master
I1027 22:14:54.040048    5063 proxier.go:294] Adding new service "default/kubernetes:https" at 10.0.0.1:443/TCP
I1027 22:14:54.040154    5063 proxier.go:294] Adding new service "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I1027 22:14:54.040223    5063 proxier.go:294] Adding new service "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP

If you see an error message about being unable to connect to the master node, you should double-check the node configuration and installation steps.

kube-proxy One of the possible reasons for not running correctly is that a required  conntrack binary cannot be found. On some Linux systems, this can also happen, depending on how you installed the cluster, for example, if you manually started the step-by-step Kubernetes installation. If that's the case, you'll need to install  conntrack the package manually (for example, on Ubuntu  sudo apt install conntrack) and try again.

Kube-proxy can run in one of several modes. In the above log, Using iptables Proxier the lines indicate that kube-proxy is running in "iptables" mode. The other most common mode is "ipvs".

Iptables mode

In "iptables" mode, you should see the following output on the node:

iptables-save | grep hostnames
-A KUBE-SEP-57KPRZ3JQVENLNBR -s 10.244.3.6/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-57KPRZ3JQVENLNBR -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.3.6:9376
-A KUBE-SEP-WNBA2IHDGP2BOBGZ -s 10.244.1.7/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-WNBA2IHDGP2BOBGZ -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.1.7:9376
-A KUBE-SEP-X3P2623AGDH6CDF3 -s 10.244.2.3/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-X3P2623AGDH6CDF3 -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.2.3:9376
-A KUBE-SERVICES -d 10.0.1.175/32 -p tcp -m comment --comment "default/hostnames: cluster IP" -m tcp --dport 80 -j KUBE-SVC-NWV5X2332I4OT4T3
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-WNBA2IHDGP2BOBGZ
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-X3P2623AGDH6CDF3
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -j KUBE-SEP-57KPRZ3JQVENLNBR

For each port of each Service, there should be 1  KUBE-SERVICES rule, one  KUBE-SVC-<hash> chain. KUBE-SVC-<hash> For each Pod end, there should be some rules corresponding to it in  that  chain, and there should be a KUBE-SEP-<hash> chain corresponding to it, which contains a few rules. The actual number of rules may vary based on your actual configuration (including NodePort and LoadBalancer services).

IPVS mode

In "ipvs" mode, you should see the following output under the node:

ipvsadm -ln
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
...
TCP  10.0.1.175:80 rr
  -> 10.244.0.5:9376               Masq    1      0          0
  -> 10.244.0.6:9376               Masq    1      0          0
  -> 10.244.0.7:9376               Masq    1      0          0
...

For each port of each Service, also NodePort, External IP and IP of LoadBalancer type service, kube-proxy will create a virtual server. For each pod end, it will create a corresponding real server. In this example, the service hostname ( 10.0.1.175:80) has 3 ends ( 10.244.0.5:937610.244.0.6:9376 and  10.244.0.7:9376).

Is kube-proxy performing proxy operations?

Assuming you do encounter one of the above situations, please retry accessing your Service via IP from the node:

curl 10.0.1.175:80
hostnames-632524106-bbpiw

If this still fails, look in  kube-proxy the logs for specific lines such as:

Setting endpoints for default/hostnames:default to [10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376]

If you don't see these, try  -v setting the flag to 4 and rebooting  kube-proxybefore looking at the logs.

Edge case: Pod cannot connect to itself via Service IP

This might sound unlikely, but it can happen, and it should work.

 This can happen if the network is not properly configured for "Hairpin" traffic, typically when  running kube-proxy in  mode with a Pod connected to a bridged network. iptables​kubelet provides the hairpin-mode flag. If endpoints of a Service try to access their own Service VIP, the endpoints can load balance traffic back to themselves. hairpin-mode flags must be set to  hairpin-veth or  promiscuous-bridge.

Common steps in diagnosing this type of problem are as follows:

  • Confirmation  hairpin-mode is set to  hairpin-veth or  promiscuous-bridge. You should see something like this below. In this example  hairpin-mode it is set to  promiscuous-bridge.

    ps auxw | grep kubelet
    
    root      3392  1.1  0.8 186804 65208 ?        Sl   00:51  11:11 /usr/local/bin/kubelet --enable-debugging-handlers=true --config=/etc/kubernetes/manifests --allow-privileged=True --v=4 --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --configure-cbr0=true --cgroup-root=/ --system-cgroups=/system --hairpin-mode=promiscuous-bridge --runtime-cgroups=/docker-daemon --kubelet-cgroups=/kubelet --babysit-daemons=true --max-pods=110 --serialize-image-pulls=false --outofdisk-transition-frequency=0
    
  • Confirm valid  hairpin-mode. To do this, you have to look at the kubelet logs. Access logs depend on the node's operating system. On some operating systems it is a file such as /var/log/kubelet.log, while others use an  journalctl access log. Note that valid  hairpin-mode may not match  --hairpin-mode flags due to compatibility. Check kubelet.log for  hairpin log lines with keywords. There should be log lines indicating valid  hairpin-mode, like the one below.

    I0629 00:51:43.648698    3252 kubelet.go:380] Hairpin mode set to "promiscuous-bridge"
    
  • If hairpin mode is enabled  hairpin-veth, ensure that  you have  permission Kubelet to operate on the node  . /sysIf everything is fine, you should see the following output:

    for intf in /sys/devices/virtual/net/cbr0/brif/*; do cat $intf/hairpin_mode; done
    
    1
    1
    1
    1
    
  • If the effective card issuance mode is  promiscuous-bridge, ensure that  Kubelet you have the authority to operate the Linux bridge on the node. If  cbr0 the bridge is being used and properly set up, you will see the following output:

    ifconfig cbr0 |grep PROMISC
    
    UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1460  Metric:1
    
  • If none of the steps above resolve the issue, please seek assistance.

Guess you like

Origin blog.csdn.net/leesinbad/article/details/131581389