minikube DNS fails after SRV query

minikube DNS fails after SRV query

(Jin Qing’s Column, Dec., 2021)

My program is using K8s DNS SRV query to discovery service,
and when it’s deployed on minikube, I find DNS failure.

I can use nslookup to reproduce the failure.

Querying a FQDN is OK. But after querying a non-existing SRV short name, the ping fails.

root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.7 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=33.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 33.779/33.834/33.889/0.055 ms
root@web-0:/# nslookup
> set type=srv
> nosuch-nosuch-nosuch-1234567890abcdefg.cn
Server:         10.96.0.10
Address:        10.96.0.10#53

** server can't find nosuch-nosuch-nosuch-1234567890abcdefg.cn: NXDOMAIN
> exit

root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.7 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=33.7 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 33.730/33.735/33.741/0.183 ms
root@web-0:/# nslookup
> set type=srv
> nginx-wrong
Server:         10.96.0.10
Address:        10.96.0.10#53

** server can't find nginx-wrong: SERVFAIL
> exit

root@web-0:/# ping google.com
ping: unknown host google.com
root@web-0:/# 

The ping will recover to normal after about 1 minute.

If I query a existing internal service name, and nslookup returns correctly, then DNS is OK after I quit nslookup.

root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.6 ms
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=2 ttl=108 time=34.8 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 33.648/34.260/34.872/0.612 ms
root@web-0:/# nslookup
> set type=srv
> nginx
Server:         10.96.0.10
Address:        10.96.0.10#53

nginx.default.svc.cluster.local service = 0 25 80 web-1.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-2.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-0.nginx.default.svc.cluster.local.
nginx.default.svc.cluster.local service = 0 25 80 web-3.nginx.default.svc.cluster.local.
> exit

root@web-0:/# ping google.com
PING google.com (142.250.66.110) 56(84) bytes of data.
64 bytes from hkg12s28-in-f14.1e100.net (142.250.66.110): icmp_seq=1 ttl=108 time=33.5 ms
^C
--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 33.529/33.529/33.529/0.000 ms
root@web-0:/# 

When DNS fails, the whole cluster can not query any domain name outside,
but internal name is OK.

https://github.com/kubernetes/minikube/issues/13137

Guess you like

Origin blog.csdn.net/jq0123/article/details/121850921