[Cloud native] Kubernetes cluster installation and configuration node initialization (master and node)

1. Initialize the master

(1) Generate a default configuration file.

kubeadm config print init-defaults > init.default.yaml

(2) Modify the configuration file.
[insert image description here

# 修改地址 节点IP地址
localAPIEndpoint.advertiseAddress: 192.168.11.190
# 修改套接字
nodeRegistration.criSocket: unix:///var/run/cri-dockerd.sock
# 修改节点名称
nodeRegistration.name: k8s-master1
# 修改镜像仓库地址为国内开源镜像库
imageRepository: registry.aliyuncs.com/google_containers
# 修改版本号
kubernetesVersion: 1.24.1
# 增加podSubnet,由于后续会安装flannel 网络插件,该插件必须在集群初始化时指定pod地址
# 10.244.0.0/16 为flannel组件podSubnet默认值,集群配置与网络组件中的配置需保持一致
networking.podSubnet: 10.244.0.0/16

1.1. Pull related images.

sudo kubeadm config images pull --config=init.default.yaml

If the failed to pull image “registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1”: output: E0317 08:32:39.321814 problem occurs, the complete error is as follows:

fly@fly:~$ sudo kubeadm config images pull --config=init.default.yaml
failed to pull image "registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1": output: E0317 08:32:39.321814   14391 remote_image.go:171] "PullImage from image service failed" err="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: connection refused\"" image="registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1"
time="2023-03-17T08:32:39Z" level=fatal msg="pulling image: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing dial unix /var/run/cri-dockerd.sock: connect: connection refused\""
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher

It prompts that pulling the image fails, use the kubeadm config images list --config kubeadm.yml command to query the image to be downloaded.

fly@fly:~$ kubeadm config images list --config init.default.yaml 
registry.aliyuncs.com/google_containers/kube-apiserver:v1.24.1
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.24.1
registry.aliyuncs.com/google_containers/kube-scheduler:v1.24.1
registry.aliyuncs.com/google_containers/kube-proxy:v1.24.1
registry.aliyuncs.com/google_containers/pause:3.7
registry.aliyuncs.com/google_containers/etcd:3.5.3-0
registry.aliyuncs.com/google_containers/coredns:v1.8.6

The reason is: the docker version is not compatible with the cri-dockerd version, update one of the lower versions. I am using docker v20.10.21 and cri-dockerd is 0.3.1.

1.2. Initialize the cluster

Initialize through the configuration file, it is recommended to select the configuration file initialization.

# 通过配置文件初始化,建议选择配置文件初始化
sudo kubeadm init --config=init.default.yaml
# 通过参数初始化
kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.24.1 --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.11.109 --cri-socket unix:///var/run/cri-dockerd.sock

After the initialization is successful, return the following content and execute according to the prompt:
insert image description here
the current user is a normal user, please execute the following command:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

The current user is the root user, please configure the environment variable:

# /etc/profile 末尾添加环境变量
export KUBECONFIG=/etc/kubernetes/admin.conf
# 执行命令,立即生效
source /etc/profile

Note: Choose one of the above two situations.

Add nodes to the cluster command, note: you need to specify --cri-socket unix:///var/run/cri-dockerd.sock. for example:

kubeadm join 192.168.11.190:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:625234ded039aa1ae29608e46bbd0381499b6e95c296d99a3f7e64f5e4053851 --cri-socket unix:///var/run/cri-dockerd.sock

kubeadm join 192.168.11.190:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:ce8db5a985c3c9739f3f0d187c5675d9c5f05a2994dbbde33324f2b93c66048e --cri-socket unix:///var/run/cri-dockerd.sock

If it expires, execute the following command on the master node to get it again:

sudo kubeadm token create --print-join-command

View cluster status:

# 查看所有pod
sudo kubectl get pods --all-namespaces
# 查看节点
sudo kubectl get node
# 查看所有组件
sudo kubectl get cs

1.3. Add pod network components

Related documents:

  1. Web Component Documentation
  2. flannel component

1.3.1. Method 1:

(1) Download the flannel network component definition file to the local (you can also copy the content of the file and paste it to a local file).

https://github.com/flannel-io/flannel/blob/master/Documentation/kube-flannel.yml
---
kind: Namespace
apiVersion: v1
metadata:
  name: kube-flannel
  labels:
    pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
- apiGroups:
  - "networking.k8s.io"
  resources:
  - clustercidrs
  verbs:
  - list
  - watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
    
    
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
    
    
          "type": "flannel",
          "delegate": {
    
    
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
    
    
          "type": "portmap",
          "capabilities": {
    
    
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
    
    
      "Network": "10.244.0.0/16",
      "Backend": {
    
    
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-flannel
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
        image: docker.io/flannel/flannel-cni-plugin:v1.1.2
       #image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.2
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
        image: docker.io/flannel/flannel:v0.21.3
       #image: docker.io/rancher/mirrored-flannelcni-flannel:v0.21.3
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: docker.io/flannel/flannel:v0.21.3
       #image: docker.io/rancher/mirrored-flannelcni-flannel:v0.21.3
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: EVENT_QUEUE_DEPTH
          value: "5000"
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
        - name: xtables-lock
          mountPath: /run/xtables.lock
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg
      - name: xtables-lock
        hostPath:
          path: /run/xtables.lock
          type: FileOrCreate

(2) Application resources.

kubectl apply -f kube-flannel.yml

Created an execution role, a running account, a configuration file, and a daemon process:

fly@k8s-master1:~$ kubectl apply -f kube-flannel.yml
namespace/kube-flannel created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created

1.3.2. Method 2

To apply the network components, install the network plugin:

# 安装网络插件
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

An abnormal situation may occur, and the domain name cannot be resolved.

Abnormal situation resolution:
(1) Through the third-party IP query website, query the IP address corresponding to the raw.githubusercontent.com domain name
insert image description here

(2) Modify /etc/hosts to add the following content:

185.199.108.133 raw.githubusercontent.com

(3) Execute the network plug-in installation again.

kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

1.4. Open the ipvs mode of kube-proxy

# 修改mod
kubectl edit -n kube-system cm kube-proxy
# 修改:mode: "ipvs"

# 重启kube-proxy 守护进程
kubectl rollout restart -n kube-system daemonset kube-proxy

View cluster status:

kubectl get pod --all-namespaces 
# 或 
kubectl get pod -A

At this point, the initialization of the master node is completed.
If the pod status is abnormal, you can get the error information by checking the logs.

kubectl -n kube-system logs <pod>

2. Node node initialization

2.1. Environment installation

  1. Install docker.
  2. Install the cri-dockerd runtime, and start the service.
  3. Install tools such as kubeadm, kubelet, and kubectl.

2.2. Modification of node environment

  1. Modify the Cgroup Driver of docker.
  2. Turn off the firewall.
  3. Disable selinux.
  4. Disable swap.
  5. Adjust the hostname (if necessary).

2.3. Add nodes to the cluster

Execute the following command on the master node to obtain:

sudo kubeadm token create --print-join-command

After obtaining the command to join the cluster through the above command, add the parameter --cri-socket unix:///var/run/cri-dockerd.sock to the command.

3. Reset the node

(1) Reset command:

sudo kubeadm reset
# 或
sudo kubeadm reset --cri-socket unix:///var/run/cri-dockerd.sock

(2) Delete related files.

sudo rm -rf /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni
/etc/cni/net.d $HOME/.kube/config

(3)ipvsadm --clear:

sudo ipvsadm --clear

(4) Delete the network:

sudo ifconfig cni0 down
sudo ip link delete cni0

Summarize

There are a lot of Kubernetes cluster configurations, but if you find a problem, it basically lies in the configuration, version, and machine resources. For example, when I was doing kubernetes initialization, the equal sign (=) in the configuration file was written as equal (==), and the node could not be found during the initialization stage; there are many details.
insert image description here

Guess you like

Origin blog.csdn.net/Long_xu/article/details/129626253