k8s 异常

 Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-ph11m_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-ph11m_default(3700d74a-cc12-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"

 这已经将10.244.1.x段的所有ip占满,自然没有available的IP可供新pod使用了。至于为何占满,这个原因尚不明朗。下面两个open issue与这个问题相关:

https://github.com/containernetworking/cni/issues/306

https://github.com/kubernetes/kubernetes/issues/21656

进入到/var/lib/cni/networks/cbr0目录下,执行下面命令可以释放那些可能是kubelet leak的IP资源:

for hash in $(tail -n +1 * | grep '^[A-Za-z0-9]*$' | cut -c 1-8); do if [ -z $(docker ps -a | grep $hash | awk '{print $1}') ]; then grep -irl $hash ./; fi; done | xargs rm

执行后,目录下的文件列表变成了:

ls -l
total 32
drw-r--r-- 2 root root 12288 Dec 27 17:11 ./
drw-r--r-- 3 root root  4096 Dec 27 13:52 ../
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.2
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.3
-rw-r--r-- 1 root root    64 Dec 27 17:11 10.244.1.4
-rw-r--r-- 1 root root    10 Dec 27 17:11 last_reserved_ip

不过pod仍然处于失败状态,但这次失败的原因又发生了变化:

Events:
  FirstSeen    LastSeen    Count    From                    SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----                    -------------    --------    ------        -------
  23s        23s        1    {default-scheduler }                    Normal        Scheduled    Successfully assigned my-nginx-1948696469-7p4nn to iz2ze39jeyizepdxhwqci6z
  22s        1s        22    {kubelet iz2ze39jeyizepdxhwqci6z}            Warning        FailedSync    Error syncing pod, skipping: failed to "SetupNetwork" for "my-nginx-1948696469-7p4nn_default" with SetupNetworkError: "Failed to setup network for pod \"my-nginx-1948696469-7p4nn_default(a40fe652-cc14-11e6-8c42-00163e1001d7)\" using network plugins \"cni\": \"cni0\" already has an IP address different from 10.244.1.1/24; Skipping pod"

而/var/lib/cni/networks/cbr0目录下的文件又开始迅速增加!问题陷入僵局。

5、flannel vxlan不通,后端换udp,仍然不通

折腾到这里,基本筋疲力尽了。于是在两个node上执行kubeadm reset,准备重新来过。

kubeadm reset后,之前flannel创建的bridge device cni0和网口设备flannel.1依然健在。为了保证环境彻底恢复到初始状态,我们可以通过下面命令删除这两个设备:

# ifconfig  cni0 down
# brctl delbr cni0
# ip link delete flannel.1

有了前面几个问题的“磨炼”后,重新init和join的k8s cluster显得格外顺利。这次minion node没有再出现什么异常。

 

http://www.tuicool.com/articles/rYRzY3q

猜你喜欢

转载自m635674608.iteye.com/blog/2359903
k8s