k8s-cluster scheduling

k8s-cluster scheduling

Introduction to a scheduler

1.1 Introduction

Scheduler is the scheduler of kubernetes. The main task is to distribute the defined pods to the nodes of the cluster. Sounds very simple, but there are many issues to consider:

 

Fairness: How to ensure that each node can be allocated resources. Efficient use of resources: All resources in the cluster are maximized

Efficiency: The performance of scheduling is better, and the scheduling of large batches of pods can be completed as soon as possible

Flexible: The logical Sheduler that allows users to control the scheduling according to their own needs is run as a separate program. After starting, the API Server will always be strong and obtain pods with empty PodSpec.NodeName. For each pod, a binding will be created to indicate that the pod Which node should be placed.

1.2 Scheduling process

The scheduling is divided into several parts: first, the nodes that do not meet the condition are filtered out, this process is called predicate; then the nodes that pass are sorted according to priority, this is priority; and finally, the node with the highest priority is selected from them. If there is an error in any step in the middle, the error is returned directly. If there is no suitable node in the predicate process, the pod will always be in the pending state and continue to retry scheduling until a node meets the conditions.

After this step, if there are multiple nodes that meet the conditions, continue the priority process: Sort the nodes according to priority size

Two-node affinity

  • pod.spec.nodeAffinity:

preferredDuringSchedulingIgnoredDuringExecution: soft strategy

requiredDuringSchedulingIgnoredDuringExecution: hard strategy (must be satisfied)

 

  • Key-value operation relationship

In: The value of label is in a list

NotIn: The value of label is not in a list

Gt: The value of label is greater than a certain value

Lt: The value of label is less than a certain value

Exists: a label exists

DoesNotExist: a label does not exist

 

2.1 Hard strategy

You must run a pod on k8s-node2

apiVersion: v1
kind: Pod
metadata:
  name: affinity
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: wangyanglinux/myapp:v1
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/hostname
            operator: In
            values:
            - k8s-node2

2.2 Soft strategy

I want to run a pod on node3 without node3

apiVersion: v1
kind: Pod
metadata:
  name: affinity1
  labels:
    app: node-affinity-pod
spec:
  containers:
  - name: with-node-affinity
    image: wangyanglinux/myapp:v1
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: kubernetes.io/hostname 
            operator: In
            values:
            - k8s-node3

Three pod affinity

pod.spec.affinity.podAffinity/podAntiAffinity:

 

preferredDuringSchedulingIgnoredDuringExecution: soft strategy

requiredDuringSchedulingIgnoredDuringExecution: hard strategy

3.1 Hard strategy

The pod-3 pod must be on the same node as the node-affinity-pod value

apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: wangyanglinux/myapp:v1
  affinity:
    podAffinity: #在
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - node-affinity-pod
        topologyKey: kubernetes.io/hostname



kubectl get pod --show-labels

3.2 Soft strategy

The pod-4 pod is not on the same node as the pod-3

 

[root@k8s-master01 diaodu]# cat pod4.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: pod-4
  labels:
    app: pod-4
spec:
  containers:
  - name: pod-4
    image: wangyanglinux/myapp:v1
  affinity:
    podAntiAffinity: #不在
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
         labelSelector:
           matchExpressions:
           - key: app
             operator: In
             values:
             - pod-3
         topologyKey: kubernetes.io/hostname


One on node1, one on node2

Four stains and tolerance

4.1 Introduction

Node affinity is a property of a pod (preference or hard requirement), which makes the pod attracted to a specific type of node. Taint is the opposite, it enables nodes to exclude a specific type of pod

Taint and toleration cooperate with each other and can be used to prevent pods from being allocated to inappropriate nodes. One or more taints can be applied to each node, which means that pods that cannot tolerate these taints will not be accepted by the node. If toleration is applied to pods, it means that these pods can (but are not required) be scheduled to nodes with matching taints

 

4.2 Stain

The composition of Taint

The kubectl taint command can be used to set a stain on a Node. After the Node is stained, there is a repulsive relationship between the Pod and the Node, which can allow the Node to refuse the scheduled execution of the Pod and even evict the Pod that the Node already exists. Go out.

 

Each stain has a key and value as the stain label, where value can be empty, and the effect describes the role of the stain. Currently tainteffect supports the following three options:

NoSchedule: indicates that k8s will not schedule the Pod to the Node with this stain

PreferNoSchedule: indicates that k8s will try to avoid scheduling Pods to Nodes with this stain

NoExecute :表示 k8s 将不会将 Pod 调度到具有该污点的 Node 上,同时会将 Node 上已经存在的 Pod 驱逐出去

 

  • 查看污点

 

[root@k8s-master01 diaodu]# kubectl describe nodes k8s-master01|grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

 

  • 设置污点

 

kubectl taint nodes k8s-node1 key=hu:NoExecute

设置完成之后,发现在node1上面的pod,都消失了

设置之前:

设置之后:

 

  • 去除污点

查看污点:

 

kubectl taint nodes k8s-node1 key:NoExecute-

 

然后再次查看,发现污点没了

 

4.3 容忍

一句话,就是pod设置了容忍,即使node有污点,也可以分配

 

设置了污点的 Node 将根据 taint 的 effect:NoSchedule、PreferNoSchedule、NoExecute 和 Pod 之间产生互斥的关系,Pod 将在一定程度上不会被调度到 Node 上。 但我们可以在 Pod 上设置容忍 ( Toleration ) ,意思是设置了容忍的 Pod 将可以容忍污点的存在,可以被调度到存在污点的 Node 上。

 

例子:

 

apiVersion: v1
kind: Pod
metadata:
  name: pod-3
  labels:
    app: pod-3
spec:
  containers:
  - name: pod-3
    image: wangyanglinux/myapp:v1
  tolerations:
  - key: "key"
    operator: "Equal"
    value: "hu"
    effect: "NoExecute"
    tolerationSeconds: 3600 #表示3600秒之后才被删除


其中 key, vaule, effect 要与 Node 上设置的 taint 保持一致

operator 的值为 Exists 将会忽略 value 值

 

tolerationSeconds 用于描述当 Pod 需要被驱逐时可以在 Pod 上继续保留运行的时间

4.3.1 当不指定 key 值时,表示容忍所有的污点 key:

 

tolerations:
- operator: "Exists"

4.3.2、当不指定 effect 值时,表示容忍所有的污点作用

 

tolerations:
- key: "key"
  operator: "Exists

 

4.3.3 有多个 Master 存在时,防止资源浪费,可以如下设置

尽可以不分配在master上面,如果node节点不够用,在分配

 

kubectl taint nodes k8s-master01 node-role.kubernetes.io/master=:PreferNoSchedule

 

五 固定节点

5.1 根据node节点的主机名选择

Pod.spec.nodeName 将 Pod 直接调度到指定的 Node 节点上,会跳过 Scheduler 的调度策略,该匹配规则是强制匹配

 

root@k8s-master01 diaodu]# cat pod5.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeName: k8s-node1
      containers:
      - name: myweb
        image: wangyanglinux/myapp:v1
        ports:
        - containerPort: 80


然后去查看效果

 

5.2 根据node标签去选择

 

设置标签

kubectl label node k8s-node2 disk=ssd

 

查看标签

 

例子:

 

[root@k8s-master01 diaodu]# cat label.yaml 
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: myweb
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myweb
    spec:
      nodeSelector:
        disk: ssd #标签
      containers:
      - name: myweb
        image: wangyanglinux/myapp:v1
        ports:
        - containerPort: 80


查看效果都运行在node2上面

 

Guess you like

Origin www.cnblogs.com/huningfei/p/12701425.html