Summary of Kubernetes installation process issues

1. Summary of HAProxy installation problems

1.1 Binding VIP failed to start

  • Description of the problem The binding vip is configured in
    , and an error message of binding failure appears when the haproxy service is started./etc/haproxy/haproxy.cfg
  • Solution
    Modify the configuration file /etc/sysctl.confand add the following content:
net.ipv4.ip_nonlocal_bind=1

Then let the variable take effect

sysctl -p 

Then start haproxy again

systemctl restart haproxy

2. ETCD installation problem summary cluster fails to start

2.1 etcd node restart failed

  • Problem Description
    The node cannot be started due to a node data failure. The etcdctl and kubectl client tools cannot be used, and both request timeout, and the etcd service of the current node is directly restarted, and an exception occurs.
  • Solution
    Step 1: Delete the data directory of the node etcd, ETCD_DATA_DIRthe variable value is the data storage directory
    Step 2: Set ETCD_INITIAL_CLUSTER_STATE="existing"
    Step 3: Restart the node,systemctl restart etcd

3. Summary of Kubernetes installation problems

3.1 namespace cannot be deleted

  • Problem description
    The status of the namespace is Terminating, so resources cannot be deployed or deleted in this namespace.
  • Solution
cd /opt
kubectl get namespace 命名空间 -o json > 命名空间.json

Modify the content in the namespace.json, specdelete statusthe corresponding values ​​of and , and then execute the following command:

kubectl proxy --port=9988 &
curl -k -H "Content-Type: application/json" -X PUT --data-binary @命名空间.json 127.0.0.1:9988/api/v1/namespaces/${命名空间}/finalize

3.2 A large number of Pods are in Terminating state

  • Problem Description
    Due to some node failures, a large number of Pods are in Terminating state
istio-system           jaeger-5994d55ffc-nmhq6                     0/1     Terminating         0          13h
istio-system           jaeger-5994d55ffc-pjj5m                     0/1     Terminating         0          11h
istio-system           kiali-64df7bf7cc-29kxl                      0/1     Terminating         0          12h
istio-system           kiali-64df7bf7cc-2bk77                      0/1     Terminating         0          11h
istio-system           kiali-64df7bf7cc-4wwhg                      0/1     Terminating         0          14h
istio-system           kiali-64df7bf7cc-8cfsh                      0/1     Terminating         0          13h
istio-system           kiali-64df7bf7cc-dks5w                      0/1     Terminating         0          15h
istio-system           kiali-64df7bf7cc-dkzgc                      0/1     Terminating         0          15h
  • Solution
kubectl get pods -n 命名空间 | grep Terminating | awk '{print $1}' | xargs kubectl delete pod -n 命名空间 --force --grace-period=0

If this happens to a large number of Pods, scripts can be written to execute them periodically.

3.3 Pod logs cannot be viewed

  • Problem description The following error message is prompted when viewing
    the log using :kubectl logs -f PodName
Error from server (Forbidden): Forbidden (user=kubernetes, verb=get, resource=nodes, subresource=proxy)
  • Solution
kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes

3.4 Pod container initialization failed

  • Problem Description
Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.6": 
Error response from daemon: Get https://k8s.gcr.io/v2/: 
net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  • Solution
    Due to network access restrictions, under normal circumstances, the mirror in the registry.k8s.io domain name cannot be accessed, so after downloading through the domestic mirror warehouse, use the tag command to rename.
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/pause:3.6 registry.k8s.io/pause:3.6

3.5 Pods are evicted

  • Problem description
    A large number of Pod status is Evicted

  • Solution
    Delete the Pod whose state is Evicted

kubectl get pods -A| grep Evicted |  awk '{print $2}' | xargs kubectl delete pod -n <namespace>

3.6 node node error

  • Problem Description
"Error syncing pod, skipping" err="network is not ready: container runtime network not ready
  • Solution
    The installation of the Kubernetes CNI network plug-in fails. If the CNI is Calico, please confirm whether the calico-node is started successfully.

3.7 View kubelet logs

journalctl -u kubelet --since today |less

3.8 The master node cannot be scheduled

  • Problem Description
0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }

The k8s node cannot be scheduled, use the kubectl tool to view the node status

kubectl get nodes -o wide

The displayed results are as follows:

NAME          STATUS                        ROLES    AGE    VERSION
k8s-master1   NotReady,SchedulingDisabled   <none>   43h    v1.24.2
k8s-master2   Ready                         <none>   4d6h   v1.24.2
k8s-node1     NotReady,SchedulingDisabled   <none>   44h    v1.24.2
  • Solution
# 禁止调度
kubectl cordon 节点名称
# 解除禁用
kubectl uncordon 节点名称

4. Summary of Calico installation problems

4.1 Access Timeout between nodes

  • Problem Description
cni-installer/<nil> <nil>: Unable to create token for CNI kubeconfig error=Post 
"https://10.255.0.1:443/api/v1/namespaces/kube-system/serviceaccounts/calico-node/token": 
dial tcp 10.255.0.1:443: i/o timeout
  • Solution
    Check whether the cluster network CNI plug-in is normal, such as whether calico-node can be started normally. Check whether --service-cluster-ip-rangethe and --cluster-cidroverlap, causing the cluster to become a stand-alone environment.

4.2 Calico-node Pod failed to start

  • Problem Description
calico-node启动失败,在事件信息中出现如下:
Back-off restarting failed container
invalid capacity 0 on image filesystem
Node k8s-node2 status is now: NodeHasNoDiskPressure
Updated Node Allocatable limit across pods
Node k8s-node2 status is now: NodeHasSufficientPID
  • Analysis log
#: kubectl logs -n kube-system calico-node-wzq2p -c install-cni
#: kubectl describe pod calico-node-wzq2p -n kube-system 
#: journalctl -u kubelet -f

Insufficient disk space does not necessarily mean that the calico-node cannot be started due to insufficient disk space. You need to check the specific log information. You can use kubectl logs -n kube-system calico-node-wzq2p -c install-cnito view the specific error message, and then analyze the problem. The author always thought that calico-node could not be started due to insufficient disk space. Later, after checking the detailed logs, it was found that the configuration parameters kube-proxyof --cluster-cidrdid not match kube-controller-managerthose setkube-apiserver in .--service-cluster-ip-range

5. Summary of CoreDNS installation problems

5.1 DNS domain name service IP address adjustment

  • Problem Description
    The default configuration of CoreDNS is inconsistent with the current kubernetes cluster configuration. Kubelet startup parameters clusterDNSdepend on IP address of the CoreDNS domain name service. clusterDNSWhen the parameters set when the Kubelet service starts are inconsistent with the IP address of the domain name service set by CoreDNS deployment, service access will time out.
  • Solution
    Execute the following command to specify the Service IP address of the DNS domain name service
cd /opt
git clone https://github.com/coredns/deployment
cd /opt/deployment/kubernetes
./deploy.sh -r 10.255.0.0/16 -i 10.255.0.2 > coredns.yaml
kubectl apply -f coredns.yaml

In the above configuration -iis to set the DNS domain name service IP address.

6. Summary of Istio installation issues

6.1 Kiali cannot connect to Istiod

  • Problem Description
unable to proxy Istiod pods. 
Make sure your Kubernetes API server has access to the Istio control plane through 8080 port
  • Solution
yum install socat -y

6.2 Istio Ingress modify network type

  • Problem Description
    By default, Istio Ingress uses LoadBalancer
  • Solution
kubectl patch svc -n istio-ingress istio-ingress  -p '{"spec": {"type": "NodePort"}}'

6.3 istio disable egress restriction

helm upgrade --set meshConfig.outboundTrafficPolicy.mode=REGISTRY_ONLY istiod  istio/istiod -n istio-system

Guess you like

Origin blog.csdn.net/hzwy23/article/details/128111446