K8S: Common errors and solutions in K8S deployment

Table of contents

1. The node node kubelet service cannot start

2. When installing the cni network plug-in, kubectl get node master and node are always noready①There is a delay, and it needs to wait for about 10 minutes. If it exceeds 15 minutes, there will be a problem

3. Deployment error kubectl get nodes No resources found

4. k8s deployment error error: kubectl get csr No resources found


1. The node node kubelet service cannot start

Problem: The node node kublet cannot start. The server memory resources are insufficient, and the node node kubelet service cannot be started and will not be saved.

Solution: use the top command and then press shift and M to sort by memory, then find the corresponding process and use lsof | grep process to view what the process is running, and close the unnecessary process. or add memory

2. When installing the cni network plug-in, kubectl get node master and node are always noready.
① There is a delay, and it needs to wait for about 10 minutes. If it exceeds 15 minutes, there will be a problem

② Check the service status of kubelet and there is this error reported failed to find plugin "flannel" in path [/opt/cni/bin]], because there is a lack of executable flannel under /opt/cni/bin.

Solution: re-upload the cni plug-in package or re-download from the network source
 

③ kubcectl get -n kube-system Check the pod information and find that the three at the beginning of kube-flannel-ds are all 0/1, which means that the network installation cni network plug-in failed to pull the image. Use kubectl describe pod kube-flannel-ds-rxh5w(pod name) -n kube-system to view logs.

 Solution: Use the command to manually pull the image docker pull quay.io/coreos/flannel:v0.14.0

                   After the pull is complete, modify the image name docker tar source name destination name (change to the name that did not pull successfully)

 

3. Deployment error kubectl get nodes No resources found

kubectl  get nodes 
#查看节点信息
报错问题:kubectl get nodes No resources found

排查思路:
所有节点:
1.关闭防火墙: 
systemctl stop firewalld
systemctl disable firewalld 

2.关闭selinux: 
sed -i 's/enforcing/disabled/' /etc/selinux/config 
setenforce 0 

3.关闭swap:  
swapoff -a 临时 
vim /etc/fstab 永久 

4.添加主机名与IP对应关系(记得设置主机名): 
cat /etc/hosts 
master 192.168.30.11
node1 192.168.30.12
node2 192.168.30.13

5.node节点
vim /usr/lib/systemd/system/docker.service
#在[Service]区域下增加一行
......
[Service]
ExecStartPost=/usr/sbin/iptables -P FORWARD ACCEPT
......

systemctl daemon-reload 
systemctl restart docker

4. k8s deployment error error: kubectl get csr No resources found

kubectl get csr
#查看csr节点申请信息
No resources found.
#报错信息
报错原因方法:因为原来的ssl证书在重启后失效了,不删除的话,即重启kubelet也无法与master通讯
解决方法:
node节点:删除所有证书
cd /opt/kubernetes/ssl
ls
kubelet-client-2023-05-11-08-41-36.pem  kubelet-client-current.pem  kubelet.crt  kubelet.key
# 删除所有的证书
rm -rf *
# 关闭开启的kubelet
systemctl stop kubelet
master节点: 删除证书重新创建
kubectl delete clusterrolebinding kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io "kubelet-bootstrap" deleted

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created
node节点:重新执行kubelet安装
#node01
bash kubelet.sh 192.168.30.11
#node02
bash kubelet.sh 192.168.30.12
master节点:测试是否成功
kubectl get csr

 

Guess you like

Origin blog.csdn.net/weixin_67287151/article/details/130630470