k8s初级实战12--pod 调度

1 简介

随着k8s 变得越来越大、越来越多样化,其调度管理就变得非常重要。k8s中通过kube-scheduler来进行调取,它通过topology-aware 算法来决定哪些节点可以运行一个pod。
Scheduler 会跟踪集群中一系列的节点,并基于多个判断条件对节点进行过滤,然后通过优先级函数来决定每个pod应该被调度到哪个节点上。

k8s 中有多种控制 pod 调度的方式,包括label、taints、podAffinity 和 podAntiAffinity;本文对其中最常见的两种(label 和 taints)进行介绍,并加以实验测试。

2 基于 label 和 taints 控制调度

2.1 通过 label 分配 pod

k8s 中可以给node打标签,然后pod中跟nodeSelector来确定调度到哪些节点上;本案例通过对节点打标签来控制pod的分配。

  1. 查看当前节点信息
    $ kubectl  get nodes
    NAME      STATUS   ROLES    AGE     VERSION
    kmaster   Ready    master   5h      v1.19.4
    knode01   Ready    <none>   4h47m   v1.19.4
    knode02   Ready    <none>   4h45m   v1.19.4
    knode03   Ready    <none>   4h42m   v1.19.4
    
  2. 查看节点标签信息
    $ kubectl describe nodes|grep -A5 -i label
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=kmaster
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/master=
    --
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=knode01
                        kubernetes.io/os=linux
    Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
    ......
    $ kubectl describe nodes|grep -i taint
    Taints:             node-role.kubernetes.io/master:NoSchedule
    Taints:             <none>
    Taints:             <none>
    Taints:             <none>
    
    master 节点多了一个 node-role.kubernetes.io/master 标签, 且其 taint 对应为NoSchedule
    
  3. 查看各个节点上容器数量
    $ docker ps -a|grep Up|wc -l
    kmaster节点:20
    knode01节点:8
    knode02节点:8
    knode03节点:8
    
  4. 给节点设置标签
    $ kubectl label nodes knode01 status=vip
    $ kubectl label nodes knode02 status=other
    
    $ kubectl get nodes --show-labels 
    NAME      STATUS   ROLES    AGE     VERSION   LABELS
    kmaster   Ready    master   5h8m    v1.19.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=kmaster,kubernetes.io/os=linux,node-role.kubernetes.io/master=
    knode01   Ready    <none>   4h55m   v1.19.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode01,kubernetes.io/os=linux,status=vip
    knode02   Ready    <none>   4h53m   v1.19.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode02,kubernetes.io/os=linux,status=other
    knode03   Ready    <none>   4h49m   v1.19.4   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=knode03,kubernetes.io/os=linux
    
  5. 通过nodeSelector在vip节点上部署3个容器, 在other上部署2个容器
    创建 vip pod
    $ im vip.yaml 
    apiVersion: v1
    kind: Pod
    metadata:
      name: vip
    spec:
      nodeSelector:
        status: vip 
      containers:
      - name: vip1
        image: busybox:1.31
        args:
        - sleep
        - "1000000"
      - name: vip2
        image: busybox:1.31
        args:
        - sleep
        - "1000000"
      - name: vip3
        image: busybox:1.31
        args:
        - sleep
        - "1000000"
    $ kubectl apply -f vip.yaml
    
    创建 other pod
    $ vim other.yaml 
    apiVersion: v1
    kind: Pod 
    metadata:
      name: other
    spec:
      nodeSelector:
        status: other 
      containers:
      - name: other1
        image: busybox:1.31
        args:
        - sleep
        - "1000000"
      - name: other2
        image: busybox:1.31
        args:
        - sleep
        - "1000000"
    
    $ kubectl apply -f other.yaml 
    pod/other created
    
  6. 查看各个节点上容器数量
    $ docker ps -a|grep Up|wc -l 
    kmaster节点:20
    knode01节点:12 (添加3个busybox+1个pause容器)
    knode02节点:11 (添加2个busybox+1个pause容器)
    knode03节点:8
    

2.2 通过 taints 控制 pod

k8s 当前支持 3 类taints, 分别为 NoSchedule ,PreferNoSchedule 和 NoExecute;下面分别对其进行实验。

  1. 保留taints NoSchedule
    $ kubectl delete -f vim.yaml 
    $ kubectl delete -f other.yaml 
    kmaster knode01 knode02 knode03 容器数量:20 8 8 8
    $ vim common.yaml 
    apiVersion: apps/v1
    kind: Deployment 
    metadata:
      labels:
        app: common
      name: common
    spec:
      replicas: 4
      selector:
        matchLabels:
          app: common
      template:
        metadata:
          labels:
            app: common
        spec:
          containers:
          - image: busybox:1.31
            name: busybox
            command: [sh, -c, 'sleep 10000']
    
    $ kubectl apply -f common.yaml 
    deployment.apps/common created
    此时 kmaster knode01 knode02 knode03 容器数量:20 10 10 12, kmaster 没有分配pod,knode01|02|03 分别分配了1|1|2个pod;
    
  2. 去掉Taints NoSchedule
    $ kubectl delete -f common.yaml 
    deployment.apps "common" deleted
    $ kubectl taint node kmaster node-role.kubernetes.io/master:NoSchedule-
    node/kmaster untainted
    $ kubectl apply -f common.yaml 
    $ docker ps -a|grep Up|wc -l
    此时 kmaster knode01 knode02 knode03 容器数量:22 10 10 10, 每个节点都分配了一个pod;
    
  3. 恢复kmaster为不可调度
    $ kubectl taint node kmaster node-role.kubernetes.io/master:NoSchedule
    $ kubectl delete -f common.yaml 
    
  4. 设置 vip 为 PreferNoSchedule
    $ kubectl taint node knode01 status=vip:PreferNoSchedule
    node/knode01 tainted
    $ kubectl apply -f common.yaml 
    $ docker ps -a|grep Up|wc -l
    此时 kmaster knode01 knode02 knode03 容器数量:20 10 10 12, knode01|02|03 分别分配1|1|2个pod;
    除了master设置NoSchedule外,每个node节点分配1个,多余的1个pod不优先分配给knode01;
    $ kubectl scale deployment common --replicas=8
    此时 kmaster knode01 knode02 knode03 容器数量:20 12 14 14, knode01|02|03 分别分配2|3|4个pod;
    除了master设置NoSchedule外,每个node节点分配2个,多余的2个pod不优先分配给knode01;
    $ kubectl delete -f common.yaml 
    
  5. 设置knode01 为 NoExcute
    $ kubectl taint node knode01 status=vip:PreferNoSchedule-
    node/knode01 untainted
    $ kubectl taint node knode01 status=vip:NoExecute
    node/knode01 tainted
    
  6. 测试knode01 查看pod是否被移除;
    $ kubectl apply -f common.yaml 
    deployment.apps/common created
    $ docker ps -a|grep Up|wc -l
    此时 kmaster knode01 knode02 knode03 容器数量:20 4 12 14,knode01减少了2个pod,knode02|03分别增加了1|3个pod;
    knode01 上除了k8s_calico-node_calico-node 和 k8s_kube-proxy_kube-proxy 外,其它容器都被移除了;
    
  7. 恢复节点为正常状态
    $ kubectl taint node knode01 status=vip:NoExecute-
    node/knode01 untainted
    $ kubectl label nodes knode01 status-
    $ kubectl label nodes knode02 status- 
    过一段时间后, 非核心的DaemonSets pod node-exporter再次调度到knode01上; 
    清除测试资源:
    $ kubectl delete -f common.yaml 
    

3 注意事项

待补充

4 说明

1 概念->调度和驱逐 (Scheduling and Eviction)->将 Pod 分配给节点
2 概念->调度和驱逐 (Scheduling and Eviction)->污点和容忍度

猜你喜欢

转载自blog.csdn.net/u011127242/article/details/113099709