案例一:
1.8版本之前.开启rbac后,apiserver默认绑定system:nodes组到system:node的clusterrole。v1.8之后,此绑定默认不存在,需要手工绑定,否则kubelet启动后会报认证错误,使用kubectl get nodes查看无法成为Ready状态
Unable to register node "192.168.122.3" with API server: nodes is forbidden: User "system:node:192.168.122.3" cannot create nodes at the cluster scope
解决方法:
使用命令kubectl get clusterrolebinding和kubectl get clusterrole可以查看系统中的角色与角色绑定
使用命令kubectl describe clusterrolebindings system:node查看system:node角色绑定的详细信息
创建角色绑定
在整个集群中授予 ClusterRole ,包括所有命名空间
kubectl create clusterrolebinding kubelet-node-clusterbinding --clusterrole=system:node --group=system:nodes
kubectl describe clusterrolebindings kubelet-node-clusterbinding
案例二:
解决kubernetes启动容器时,容器一直是ContainerCreating不能running
Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)"
解决方法:yum install python-rhsm-certificates
案例三:
以下问题为Kubernetes版本与docker版本不兼容导致cgroup功能失效
Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
解决方法: /usr/lib/systemd/system/kubelet.service新增2条参数
--runtime-cgroups=/systemd/system.slice
--kubelet-cgroups=/systemd/system.slice
案例四:
当指定镜像没有可运行的程序时,导致的pod无限重启
Aug 15 16:15:19 k8s-node31 kubelet: I0815 16:15:19.475703 12531 kuberuntime_manager.go:513] Container {Name:centos7 Image:172.23.210.31:5000/nginx_2 Command:[] Args:[] WorkingDir: Ports:[{Name: HostPort:0 ContainerPort:rces:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:default-token-wstcp ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil>}] VolumeDevices:[] LivenessProbe:nil Readine/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:Always SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Aug 15 16:15:19 k8s-node31 kubelet: I0815 16:15:19.475788 12531 kuberuntime_manager.go:757] checking backoff for container "centos7" in pod "centos7-7b557486d9-26r84_default(57aba4ee-a063-11e8-966d-000c293dc062)"
Aug 15 16:15:19 k8s-node31 kubelet: I0815 16:15:19.475859 12531 kuberuntime_manager.go:767] Back-off 10s restarting failed container=centos7 pod=centos7-7b557486d9-26r84_default(57aba4ee-a063-11e8-966d-000c293dc062)
Aug 15 16:15:19 k8s-node31 kubelet: E0815 16:15:19.475885 12531 pod_workers.go:186] Error syncing pod 57aba4ee-a063-11e8-966d-000c293dc062 ("centos7-7b557486d9-26r84_default(57aba4ee-a063-11e8-966d-000c293dc062)"), skipth CrashLoopBackOff: "Back-off 10s restarting failed container=centos7 pod=centos7-7b557486d9-26r84_default(57aba4ee-a063-11e8-966d-000c293dc062)"
解决方法: 重新制作包含运行程序的镜像