k8s 1.9 kube-dns 服务端口不监听故障处理记录

问题发现：

最近两天在准备将公司的所有项目的开发环境从原有的低版本集群迁移至1.9集群上。忙活了一上午，环境代码通过jenkins推送到位，然而测试发现大部分项目都无法正常运行。经过检查，发现在pod内部无法解析域名，只能使用IP访问外部。然而现行环境系统调用大都是通过域名方式的，这个问题必须得解决。

mark：
在k8s内部记录的dns条目里，其中FQDN由5部分组成，对应的ip则是svc的ip：

#格式如下，其中最后两段的cluster.local是在kubelet启动时domain参数中指定的
SVC_NAME.NAMESPACE.svc.cluster.local      SVC_IP

#例如：
kube-dns.kube-system.svc.cluster.local    10.96.0.10
#在集群内部，可以通用kube-dns.kube-system.svc.cluster.local，其中后面两端默认可以省略，直接使用kube-dns.kube-system作hostname亦可

填坑开始：

在pod里面ping域名不通,ping ip 正常

root@wishadmintest-7bdd887b68-sgmcp:/# ping  www.baidu.com                     
ping: unknown host www.baidu.com

root@wishadmintest-7bdd887b68-sgmcp:/# ping 114.114.114.114
PING 114.114.114.114 (114.114.114.114) 56(84) bytes of data.
64 bytes from 114.114.114.114: icmp_seq=1 ttl=85 time=26.8 ms

2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 26.588/26.727/26.866/0.139 ms

检查coredns的配置正常：

root@yksp009029:~# kubectl edit configmap coredns -n kube-system
data:
  Corefile: |
    .:53 {
        errors
        log
        health
        kubernetes cluster.local 10.96.0.0/12 {
           pods insecure
        }
        prometheus
        proxy . /etc/resolv.conf
        cache 30
    }

检查svc、endpoint正常

root@yksp009029:~# kubectl get ep kube-dns -n kube-system
NAME       ENDPOINTS                                                     AGE
kube-dns   172.26.0.162:53,172.26.2.171:53,172.26.0.162:53 + 1 more...   1h

运行一个busybox pod使用nslookup检查dns解析是否正常，发现dns服务无法解析：

root@yksp009029:~# kubectl run busybox --rm -ti --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.

/ # nslookup kube-dns.kube-system
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kube-dns.kube-system'

回到node上，继续检查，果然发现了奇葩的问题：

#here are svcs info
root:~# kubectl get svc -o wide -n kube-system
NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP    PORT(S)         AGE       SELECTOR
heapster               ClusterIP   10.111.110.137   <none>         80/TCP          67d       k8s-app=heapster
kube-dns               ClusterIP   10.96.0.10       <none>         53/UDP,53/TCP   68d       k8s-app=kube-dns
kubernetes-dashboard   NodePort    10.104.130.140   <none>         443:30843/TCP   67d       k8s-app=kubernetes-dashboard
monitoring-grafana     ClusterIP   10.109.146.115   192.168.9.60   80/TCP          67d       k8s-app=grafana
monitoring-influxdb    ClusterIP   10.101.110.69    <none>         8086/TCP        67d       k8s-app=influxdb
traefik-web-ui         ClusterIP   10.98.101.187    <none>         80/TCP          67d       k8s-app=traefik-ingress-lb


#all SVCs work normally except kube-dns
root:~# telnet 10.111.110.137 80
Trying 10.111.110.137...
Connected to 10.111.110.137.
Escape character is '^]'.
^C
Connection closed by foreign host.

#The port of the kube-dns svc is in a non-listening state,but in fact, the pod on end-point is listening
root:~# telnet 10.96.0.10 53
Trying 10.96.0.10...
telnet: Unable to connect to remote host: No route to host

#ping is ok
root:~# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
64 bytes from 10.96.0.10: icmp_seq=1 ttl=64 time=0.055 ms
64 bytes from 10.96.0.10: icmp_seq=2 ttl=64 time=0.038 ms

#the kube-dns endpoint pod is in listening state
root:~# telnet 172.26.2.164 53
Trying 172.26.2.164...
Connected to 172.26.2.164.
Escape character is '^]'.
Connection closed by foreign host.

所有的SVC都正常工作，端口正常监听，唯独除了kube-dns的svc之外，yaml文件定义其工作在tcp\udp 53端口，然后telnet测试发现53端口根本不通。而服务后端的endpoint pod则是正常工作的。使用dig命令解析dns记录，当指定nameserver为svc ip时，连接超时，指定为pod ip时，正常解析：

dig @10.96.0.10 kubernetes-dashboard.kube-system
;; connection timed out; no servers could be reached

dig @172.26.2.164 kubernetes-dashboard.kube-system
;kubernetes-dashboard.kube-system. IN   A

问题的逻辑非常简单，即dns的svc不提供服务了，无法将request转交给endpoint pod。

google了一番，几乎找不到类似的问题和有价值的方案，尝试了如下几种办法：
一、kubectl删除configmap、pod、svc，然后再重新create，问题依旧

二、怀疑是iptables转发问题，iptables -F一把梭清除所有k8s生成的防火墙rule，正常情况下，kubelet会监控svc，自动动态生成iptables rule，在测试环境也试验过多次，但是这次删除它居然不自动更新rule，删完了就没了，没了。。。在这个坑里被陷进去好久才爬起来

三、将coredns相关组件全部干掉重来。删掉原有的所有的coredns相关资源（serviceaccount、clusterrole、clusterrolebinding、configmap、deploy、svc），github从项目主页拉部署模板，修改好变量，模板稍后附在下方。
重新部署：kubectl create -f coredns.yaml，部署完发现问题依旧。好嘛，都删了再来一次，WTF，这次居然解决了，正常解析了！

额外mark一下两个好用的dns调试的工具，已经封装成了pod，可以直接拉取使用，使用方法：

#dnstools,coredns官方提供
root@yksp009028:~# kubectl run -it --restart=Never --image=infoblox/dnstools:latest dnstools
If you don't see a command prompt, try pressing enter.
dnstools# host kube-dns.kube-system
kube-dns.kube-system.svc.cluster.local has address 10.96.0.10

#busybox，很常用的k8s调试类pod，常见工具都有
root@yksv001238:~# kubectl run busybox --rm -ti --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ # nslookup otimelinedbdm.default.svc.cluster.local
Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      otimelinedbdm.default.svc.cluster.local
Address 1: 10.111.237.221 otimelinedbdm.default.svc.cluster.local

k8s，用起来很爽，坑起来更坑，各种开源组件强强联（黏）合拼凑在一起，总有些奇奇怪怪的小毛病且google都很难找到解决方法，但是大势所趋，硬着头皮上，顺手github上提交了issue，希望能在后面的版本解决。

附上coredns完整部署yaml文件，以备后用：

apiVersion: v1
kind: ServiceAccount
metadata:
  name: coredns
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
rules:
- apiGroups:
  - ""
  resources:
  - endpoints
  - services
  - pods
  - namespaces
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
  labels:
    kubernetes.io/bootstrapping: rbac-defaults
  name: system:coredns
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:coredns
subjects:
- kind: ServiceAccount
  name: coredns
  namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        log
        health
        kubernetes cluster.local 10.96.0.0/12 {
           pods insecure
        }
        prometheus
        proxy . /etc/resolv.conf
        cache 30
    }
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
  labels:
    k8s-app: kube-dns
    kubernetes.io/name: "CoreDNS"
spec:
  replicas: 1
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    metadata:
      labels:
        k8s-app: kube-dns
    spec:
      serviceAccountName: coredns
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"
      containers:
      - name: coredns
        image: coredns/coredns:1.1.2
        imagePullPolicy: IfNotPresent
        args: [ "-conf", "/etc/coredns/Corefile" ]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 5
      dnsPolicy: Default
      volumes:
        - name: config-volume
          configMap:
            name: coredns
            items:
            - key: Corefile
              path: Corefile
---
apiVersion: v1
kind: Service
metadata:
  name: kube-dns
  namespace: kube-system
  annotations:
    prometheus.io/scrape: "true"
  labels:
    k8s-app: kube-dns
    kubernetes.io/cluster-service: "true"
    kubernetes.io/name: "CoreDNS"
spec:
  selector:
    k8s-app: kube-dns
  clusterIP: 10.96.0.10
  ports:
  - name: dns
    port: 53
    protocol: UDP
  - name: dns-tcp
    port: 53
    protocol: TCP