Rancher node root directory space insufficient solution

1. Problem description


When creating the rancher node server, the disk directory space was not planned. According to the default configuration, the root directory is /only 50G, and the default installation path of docker is /var/lib/docker, resulting in a usage rate of 85% of the root directory. Rancher nodes frequently report disk pressure alarms kubelet has disk pressure, so the default data directory of docker needs to be modified.

2. Modify the docker default data directory


  1. First look at the current data directory
docker info | grep 'Docker Root Dir'
Docker Root Dir: /var/lib/docker
  1. stop docker service
systemctl stop docker
  1. Create a new directory under the directory with space
mkdir -p /home/docker/
  1. Copy data to new directory
cp -R /var/lib/docker /home/docker/
  1. Modify the configuration file
vi /etc/docker/daemon.json

Configure the "graph" parameter. My docker version is 20.10. Other versions can check the configuration method by themselves.

{
    
    
  "graph": "/home/docker"
}
  1. Load the configuration file and restart the service
systemctl daemon-reload
systemctl restart docker
  1. After modification, check the docker information again
docker info | grep 'Docker Root Dir'
Docker Root Dir: /home/docker
  1. After confirming that it is correct, delete the old data in the root directory
rm -rf /var/lib/docker/

3. Questions


Since the workload has been deployed on this node in Rancher before, the following error is reported when starting the container again:

failed to create sandbox: 
ResolvConfPath /home/docker/container/xxx/resolv.conf does not exist

Tried many methods but could not solve the problem, and finally had to rejoin the node.

  1. Dispel the node
    In the Rancher node management, dispel the node, after the dispelling is completed, the status is drained, and then delete the node
  2. Clean up the node
    Use the official node cleanup script to clean up the node

#!/bin/bash

KUBE_SVC='
kubelet
kube-scheduler
kube-proxy
kube-controller-manager
kube-apiserver
'

for kube_svc in ${KUBE_SVC};
do
  # 停止服务
  if [[ `systemctl is-active ${
     
     kube_svc}` == 'active' ]]; then
    systemctl stop ${kube_svc}
  fi
  # 禁止服务开机启动
  if [[ `systemctl is-enabled ${
     
     kube_svc}` == 'enabled' ]]; then
    systemctl disable ${kube_svc}
  fi
done

# 停止所有容器
docker stop $(docker ps -aq)

# 删除所有容器
docker rm -f $(docker ps -qa)

# 删除所有容器卷
docker volume rm $(docker volume ls -q)

# 卸载mount目录
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher;
do
  umount $mount;
done

# 备份目录
mv /etc/kubernetes /etc/kubernetes-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/etcd /var/lib/etcd-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/rancher /var/lib/rancher-bak-$(date +"%Y%m%d%H%M")
mv /opt/rke /opt/rke-bak-$(date +"%Y%m%d%H%M")

# 删除残留路径
rm -rf /etc/ceph \
    /etc/cni \
    /opt/cni \
    /run/secrets/kubernetes.io \
    /run/calico \
    /run/flannel \
    /var/lib/calico \
    /var/lib/cni \
    /var/lib/kubelet \
    /var/log/containers \
    /var/log/kube-audit \
    /var/log/pods \
    /var/run/calico \
    /usr/libexec/kubernetes

# 清理网络接口
no_del_net_inter='
lo
docker0
eth
ens
bond
'

network_interface=`ls /sys/class/net`

for net_inter in $network_interface;
do
  if ! echo "${no_del_net_inter}" | grep -qE ${net_inter:0:3}; then
    ip link delete $net_inter
  fi
done

# 清理残留进程
port_list='
80
443
6443
2376
2379
2380
8472
9099
10250
10254
'

for port in $port_list;
do
  pid=`netstat -atlnup | grep $port | awk '{print $7}' | awk -F '/' '{print $1}' | grep -v - | sort -rnk2 | uniq`
  if [[ -n $pid ]]; then
    kill -9 $pid
  fi
done

kube_pid=`ps -ef | grep -v grep | grep kube | awk '{print $2}'`

if [[ -n $kube_pid ]]; then
  kill -9 $kube_pid
fi

# 清理Iptables表
## 注意:如果节点Iptables有特殊配置,以下命令请谨慎操作
sudo iptables --flush
sudo iptables --flush --table nat
sudo iptables --flush --table filter
sudo iptables --table nat --delete-chain
sudo iptables --table filter --delete-chain
systemctl restart docker

  1. rejoin the node

Guess you like

Origin blog.csdn.net/qq12547345/article/details/128470553