For newly installed Kubernetes, a common problem is that the Service cannot run normally. You've run the Pod through the Deployment (or other workload controller), and created the Service, but when you try to access it, nothing happens. Hopefully this document will help you and find out what went wrong.
Run the command in the pod
For many of the steps here, you'll probably want to know what a Pod running in your cluster looks like. The easiest way is to run an interactive busybox pod:
kubectl run -it --rm --restart=Never busybox --image=gcr.io/google-containers/busybox sh
Instructions: If you don't see a command prompt, press Enter.
If you already have a running pod that you want to use, you can run the following command to access it:
kubectl exec <POD-NAME> -c <CONTAINER-NAME> -- <COMMAND>
set up
In order to complete the tasks of this practice, we first run a few Pods. Since you may be debugging your own Service, you can substitute your own information, or you can follow the tutorial and start the steps below to get the second data point.
kubectl create deployment hostnames --image=registry.k8s.io/serve_hostname
deployment.apps/hostnames created
kubectl
The command will print the type and name of the resource created or changed, which can be used in subsequent commands. Let's scale up the number of replicas of this deployment to 3.
kubectl scale deployment hostnames --replicas=3
deployment.apps/hostnames scaled
Note that this is similar to how you start a Deployment with the following YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: hostnames
name: hostnames
spec:
selector:
matchLabels:
app: hostnames
replicas: 3
template:
metadata:
labels:
app: hostnames
spec:
containers:
- name: hostnames
image: registry.k8s.io/serve_hostname
The "app" label is kubectl create deployment
automatically set based on the Deployment name.
Make sure your Pod is running:
kubectl get pods -l app=hostnames
NAME READY STATUS RESTARTS AGE
hostnames-632524106-bbpiw 1/1 Running 0 2m
hostnames-632524106-ly40y 1/1 Running 0 2m
hostnames-632524106-tlaok 1/1 Running 0 2m
You can also confirm that your pod is serving. You can get a list of Pod IP addresses and test them directly.
kubectl get pods -l app=hostnames \
-o go-template='{
{range .items}}{
{.status.podIP}}{
{"\n"}}{
{end}}'
10.244.0.5
10.244.0.6
10.244.0.7
The example container used for this tutorial serves its own hostname over HTTP on port 9376, but if you want to debug your own application, you'll need to use the port number your Pod is listening on.
Run inside the Pod:
for ep in 10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376; do
wget -qO- $ep
done
The output looks like this:
hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok
If you don't get the expected response at this point, your pod may not be in a healthy state, or it may not be listening on the port you think it is. You may find kubectl logs
the command useful to see what's going on, or you may need to kubectl exec
debug by going directly into the pod and from there.
Assuming everything has gone according to plan so far, you can start investigating why your Service isn't working as it should.
Does Service exist?
Careful readers will notice that we haven't actually created a Service -- this is by design. This step is sometimes forgotten, it's the first step to check.
So what happens if I try to access a Service that doesn't exist? Assuming you have another Pod that matches the Service by name, you will get something like:
wget -O- hostnames
Resolving hostnames (hostnames)... failed: Name or service not known.
wget: unable to resolve host address 'hostnames'
The first thing to check is whether the Service actually exists:
kubectl get svc hostnames
No resources found.
Error from server (NotFound): services "hostnames" not found
Let's create Service. As before, in this practice -- you can use the content of your own Service here.
kubectl expose deployment hostnames --port=80 --target-port=9376
service/hostnames exposed
Rerun the query command:
kubectl get svc hostnames
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hostnames ClusterIP 10.0.1.175 <none> 80/TCP 5s
Now you know that Service does exist.
As before, this step has the same effect as starting the Service through YAML:
apiVersion: v1
kind: Service
metadata:
labels:
app: hostnames
name: hostnames
spec:
selector:
app: hostnames
ports:
- name: default
protocol: TCP
port: 80
targetPort: 9376
To highlight the completeness of the configuration scope, the Services you create here use a different port number than the Pods. These values can be the same for many real Services.
Are there network policy inbound rules that affect the target Pod?
If you have deployed any hostnames-*
network policy inbound rules that could affect incoming traffic to your pods, you need to check them.
Is the Service reachable by DNS name?
Usually the client is matched to the Service by the DNS name.
Run the following command from a Pod under the same namespace:
nslookup hostnames
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local
If that fails, then your Pod and Service may be in different namespaces, try using a namespace-qualified name (also run inside a Pod):
nslookup hostnames.default
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames.default
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local
If successful, then you need to adjust your application to use a cross-namespace name to access it, or run the application and Service in the same namespace. If that still fails, try a fully qualified name:
nslookup hostnames.default.svc.cluster.local
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: hostnames.default.svc.cluster.local
Address 1: 10.0.1.175 hostnames.default.svc.cluster.local
Note the suffix here: "default.svc.cluster.local". "default" is the namespace we are operating in. "svc" indicates that this is a Service. "cluster.local" is your cluster domain, it may be different in your own cluster.
You can also try this on a node in the cluster:
illustrate:
10.0.0.10 is the DNS service IP of the cluster, yours may be different.
nslookup hostnames.default.svc.cluster.local 10.0.0.10
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: hostnames.default.svc.cluster.local
Address: 10.0.1.175
/etc/resolv.conf
If you are able to look up using fully qualified names but not relative names, you need to check that you have the correct files in your Pod . Run the following command in the Pod:
cat /etc/resolv.conf
You should see output similar to this:
nameserver 10.0.0.10
search default.svc.cluster.local svc.cluster.local cluster.local example.com
options ndots:5
nameserver
line must indicate your cluster's DNS Service, which is --cluster-dns
passed to the kubelet via a flag.
search
Lines must contain an appropriate suffix in order to look up the Service name. In this case, it looks for services in the local namespace ( ), services in default.svc.cluster.local
all namespaces ( ), and finally the name of the service in the cluster ( ). Depending on your own installation, there may be additional records (up to 6). The cluster suffix is passed to via flag . In this article, we assume the suffix is "cluster.local". Your cluster configuration may be different, in which case you should change it in all commands above.svc.cluster.local
cluster.local
--cluster-domain
kubelet
options
line must be set high enough ndots
for the DNS client library to consider the search path. By default, Kubernetes sets this value to 5, which is high enough to override all DNS names it generates.
Is there a Service accessible by DNS name?
If the above method still fails, DNS can't find the Service you need, you can take a step back and see what else is not working properly. The Kubernetes main Service should always be working. Run the following command in the Pod:
nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.0.0.1 kubernetes.default.svc.cluster.local
If that fails, you may need to go to the kube-proxy section of this document, or even go back to the top of the document and start over, but instead of debugging your own Service, debug the DNS Service.
Can Service be accessed via IP?
Assuming you have confirmed that DNS is working properly, the next thing to test is whether your Service can be accessed normally through its IP. From a Pod in the cluster, try to access the Service's IP ( kubectl get
obtained from the command above).
for i in $(seq 1 3); do
wget -qO- 10.0.1.175:80
done
The output should look something like this:
hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok
If the Service status is OK, you should get the correct response. If not, there are many things that could go wrong, so read on.
Is the Service configured correctly?
It might sound stupid, but you should double or even triple check that your Service is configured correctly and matches your Pod. Look at your Service configuration and verify it:
kubectl get service hostnames -o json
{
"kind": "Service",
"apiVersion": "v1",
"metadata": {
"name": "hostnames",
"namespace": "default",
"uid": "428c8b6c-24bc-11e5-936d-42010af0a9bc",
"resourceVersion": "347189",
"creationTimestamp": "2015-07-07T15:24:29Z",
"labels": {
"app": "hostnames"
}
},
"spec": {
"ports": [
{
"name": "default",
"protocol": "TCP",
"port": 80,
"targetPort": 9376,
"nodePort": 0
}
],
"selector": {
"app": "hostnames"
},
"clusterIP": "10.0.1.175",
"type": "ClusterIP",
"sessionAffinity": "None"
},
"status": {
"loadBalancer": {}
}
}
- Is the Service port you want to access
spec.ports[]
listed in ? targetPort
Is this correct for your Pods (many Pods use a different port than the Service)?- If you want to use a numeric port, is its type a number (9376) or the string "9376"?
- If you want to use a named port, does your Pod expose a port with the same name?
- Does the port
protocol
correspond to the Pod?
Does Service have Endpoints?
If you've gotten this far, you've confirmed that your Service is properly defined and resolvable via DNS. Now, let's check that the Pod you're running is indeed picked up by the Service.
Earlier, we have seen that the pod is running. We can check again:
kubectl get pods -l app=hostnames
NAME READY STATUS RESTARTS AGE
hostnames-632524106-bbpiw 1/1 Running 0 1h
hostnames-632524106-ly40y 1/1 Running 0 1h
hostnames-632524106-tlaok 1/1 Running 0 1h
-l app=hostnames
The parameter is a label selector configured on the Service.
The "AGE" column shows that these pods have been up for an hour, which means they are running fine without crashing.
The "RESTARTS" column indicates that the Pod is not crashing or restarting frequently. Frequent crashes can cause intermittent connection issues. If the number of restarts is too large, learn about related technologies by debugging Pods.
There is a control loop in the Kubernetes system that evaluates the selectors for each Service and saves the result into an Endpoints object.
kubectl get endpoints hostnames
NAME ENDPOINTS
hostnames 10.244.0.5:9376,10.244.0.6:9376,10.244.0.7:9376
This confirms that the Endpoints controller has found the correct Pods for your Service. If ENDPOINTS
the column has a value of <none>
, you should check the field for the Service spec.selector
, and the value for the Pod you actually want to select metadata.labels
. A common mistake is a typo or other error, such as Service wants to choose app=hostnames
, but Deployment specifies it run=hostnames
. kubectl run
It can also be used to create Deployments in versions prior to 1.18 .
Are the pods working fine?
At this point, you know your Service exists and has been matched to your Pod. At the beginning of this lab, you checked out the Pod itself. Let's check again that the Pod is actually working - you can bypass the Service mechanism and go directly to the Pod, as shown in the Endpoints above.
illustrate:
These commands use the Pod port (9376), not the Service port (80).
Run in the Pod:
for ep in 10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376; do
wget -qO- $ep
done
The output should look something like this:
hostnames-632524106-bbpiw
hostnames-632524106-ly40y
hostnames-632524106-tlaok
You want each Pod in the Endpoint list to return its own hostname. If this is not the case (or what is the correct behavior for your own pods), you should investigate what happened.
Is kube-proxy working properly?
If you get here, your Service is running, has Endpoints, and Pods are actually providing services. At this point, the entire Service proxy mechanism is suspect. Let's go step by step to confirm it's ok.
The default implementation of Service (as used on most clusters) is kube-proxy. This is a program that runs on each node and is responsible for configuring one of the mechanisms used to provide the Service abstraction. If your cluster does not use kube-proxy, the following sections will not apply and you will have to check the implementation of the Service you are using.
Is kube-proxy running properly? {#is-kube-proxy working}
Confirm kube-proxy
that it is running on the node. Running directly on the node, you will get output similar to the following:
ps auxw | grep kube-proxy
root 4194 0.4 0.1 101864 17696 ? Sl Jul04 25:43 /usr/local/bin/kube-proxy --master=https://kubernetes-master --kubeconfig=/var/lib/kube-proxy/kubeconfig --v=2
Next, make sure it doesn't fail visibly, such as failing to connect to the master node. To do this, you have to look at the logs. The way to access the logs depends on your node's operating system. On some operating systems the log is a file such as /var/log/messages kube-proxy.log, while other operating systems use journalctl
access logs. You should see output similar to:
I1027 22:14:53.995134 5063 server.go:200] Running in resource-only container "/kube-proxy"
I1027 22:14:53.998163 5063 server.go:247] Using iptables Proxier.
I1027 22:14:54.038140 5063 proxier.go:352] Setting endpoints for "kube-system/kube-dns:dns-tcp" to [10.244.1.3:53]
I1027 22:14:54.038164 5063 proxier.go:352] Setting endpoints for "kube-system/kube-dns:dns" to [10.244.1.3:53]
I1027 22:14:54.038209 5063 proxier.go:352] Setting endpoints for "default/kubernetes:https" to [10.240.0.2:443]
I1027 22:14:54.038238 5063 proxier.go:429] Not syncing iptables until Services and Endpoints have been received from master
I1027 22:14:54.040048 5063 proxier.go:294] Adding new service "default/kubernetes:https" at 10.0.0.1:443/TCP
I1027 22:14:54.040154 5063 proxier.go:294] Adding new service "kube-system/kube-dns:dns" at 10.0.0.10:53/UDP
I1027 22:14:54.040223 5063 proxier.go:294] Adding new service "kube-system/kube-dns:dns-tcp" at 10.0.0.10:53/TCP
If you see an error message about being unable to connect to the master node, you should double-check the node configuration and installation steps.
kube-proxy
One of the possible reasons for not running correctly is that a required conntrack
binary cannot be found. On some Linux systems, this can also happen, depending on how you installed the cluster, for example, if you manually started the step-by-step Kubernetes installation. If that's the case, you'll need to install conntrack
the package manually (for example, on Ubuntu sudo apt install conntrack
) and try again.
Kube-proxy can run in one of several modes. In the above log, Using iptables Proxier
the lines indicate that kube-proxy is running in "iptables" mode. The other most common mode is "ipvs".
Iptables mode
In "iptables" mode, you should see the following output on the node:
iptables-save | grep hostnames
-A KUBE-SEP-57KPRZ3JQVENLNBR -s 10.244.3.6/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-57KPRZ3JQVENLNBR -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.3.6:9376
-A KUBE-SEP-WNBA2IHDGP2BOBGZ -s 10.244.1.7/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-WNBA2IHDGP2BOBGZ -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.1.7:9376
-A KUBE-SEP-X3P2623AGDH6CDF3 -s 10.244.2.3/32 -m comment --comment "default/hostnames:" -j MARK --set-xmark 0x00004000/0x00004000
-A KUBE-SEP-X3P2623AGDH6CDF3 -p tcp -m comment --comment "default/hostnames:" -m tcp -j DNAT --to-destination 10.244.2.3:9376
-A KUBE-SERVICES -d 10.0.1.175/32 -p tcp -m comment --comment "default/hostnames: cluster IP" -m tcp --dport 80 -j KUBE-SVC-NWV5X2332I4OT4T3
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -m statistic --mode random --probability 0.33332999982 -j KUBE-SEP-WNBA2IHDGP2BOBGZ
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-X3P2623AGDH6CDF3
-A KUBE-SVC-NWV5X2332I4OT4T3 -m comment --comment "default/hostnames:" -j KUBE-SEP-57KPRZ3JQVENLNBR
For each port of each Service, there should be 1 KUBE-SERVICES
rule, one KUBE-SVC-<hash>
chain. KUBE-SVC-<hash>
For each Pod end, there should be some rules corresponding to it in that chain, and there should be a KUBE-SEP-<hash>
chain corresponding to it, which contains a few rules. The actual number of rules may vary based on your actual configuration (including NodePort and LoadBalancer services).
IPVS mode
In "ipvs" mode, you should see the following output under the node:
ipvsadm -ln
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
...
TCP 10.0.1.175:80 rr
-> 10.244.0.5:9376 Masq 1 0 0
-> 10.244.0.6:9376 Masq 1 0 0
-> 10.244.0.7:9376 Masq 1 0 0
...
For each port of each Service, also NodePort, External IP and IP of LoadBalancer type service, kube-proxy will create a virtual server. For each pod end, it will create a corresponding real server. In this example, the service hostname ( 10.0.1.175:80
) has 3 ends ( 10.244.0.5:9376
, 10.244.0.6:9376
and 10.244.0.7:9376
).
Is kube-proxy performing proxy operations?
Assuming you do encounter one of the above situations, please retry accessing your Service via IP from the node:
curl 10.0.1.175:80
hostnames-632524106-bbpiw
If this still fails, look in kube-proxy
the logs for specific lines such as:
Setting endpoints for default/hostnames:default to [10.244.0.5:9376 10.244.0.6:9376 10.244.0.7:9376]
If you don't see these, try -v
setting the flag to 4 and rebooting kube-proxy
before looking at the logs.
Edge case: Pod cannot connect to itself via Service IP
This might sound unlikely, but it can happen, and it should work.
This can happen if the network is not properly configured for "Hairpin" traffic, typically when running kube-proxy
in mode with a Pod connected to a bridged network. iptables
kubelet provides the hairpin-mode flag. If endpoints of a Service try to access their own Service VIP, the endpoints can load balance traffic back to themselves. hairpin-mode
flags must be set to hairpin-veth
or promiscuous-bridge
.
Common steps in diagnosing this type of problem are as follows:
-
Confirmation
hairpin-mode
is set tohairpin-veth
orpromiscuous-bridge
. You should see something like this below. In this examplehairpin-mode
it is set topromiscuous-bridge
.ps auxw | grep kubelet
root 3392 1.1 0.8 186804 65208 ? Sl 00:51 11:11 /usr/local/bin/kubelet --enable-debugging-handlers=true --config=/etc/kubernetes/manifests --allow-privileged=True --v=4 --cluster-dns=10.0.0.10 --cluster-domain=cluster.local --configure-cbr0=true --cgroup-root=/ --system-cgroups=/system --hairpin-mode=promiscuous-bridge --runtime-cgroups=/docker-daemon --kubelet-cgroups=/kubelet --babysit-daemons=true --max-pods=110 --serialize-image-pulls=false --outofdisk-transition-frequency=0
-
Confirm valid
hairpin-mode
. To do this, you have to look at the kubelet logs. Access logs depend on the node's operating system. On some operating systems it is a file such as /var/log/kubelet.log, while others use anjournalctl
access log. Note that validhairpin-mode
may not match--hairpin-mode
flags due to compatibility. Check kubelet.log forhairpin
log lines with keywords. There should be log lines indicating validhairpin-mode
, like the one below.I0629 00:51:43.648698 3252 kubelet.go:380] Hairpin mode set to "promiscuous-bridge"
-
If hairpin mode is enabled
hairpin-veth
, ensure that you have permissionKubelet
to operate on the node ./sys
If everything is fine, you should see the following output:for intf in /sys/devices/virtual/net/cbr0/brif/*; do cat $intf/hairpin_mode; done
1 1 1 1
-
If the effective card issuance mode is
promiscuous-bridge
, ensure thatKubelet
you have the authority to operate the Linux bridge on the node. Ifcbr0
the bridge is being used and properly set up, you will see the following output:ifconfig cbr0 |grep PROMISC
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1460 Metric:1
- If none of the steps above resolve the issue, please seek assistance.