Cloud native k8s] k8s affinity, anti-affinity, taint, tolerance

Table of contents

1. K8s scheduling

2. Affinity and anti-affinity

1. Pods and Nodes

2. Hard Affinity and Soft Affinity

3. Stain and Tolerance

1. Taint

1.1 Composition of stains

1.2 Stain setup and removal

2. Tolerations

2.1 Basic usage of Toleration

         2.2 Toleration case

3. Multi-taint and multi-tolerant configuration


1. K8s scheduling

  1. The scheduler uses the list-watch mechanism of kubernetes to discover Pods that are newly created in the cluster and have not yet been scheduled to the Node. The scheduler will schedule each unscheduled Pod found to run on a suitable Node.
  2. kube-scheduler is the default scheduler for Kubernetes clusters and is part of the cluster control plane. If you really want or have a need for this, kube-scheduler is designed to allow you to write a scheduling component and replace the original kube-scheduler.
  3. Factors to consider when making scheduling decisions include: individual and aggregate resource requests, hardware/software/policy constraints, affinity and anti-affinity requirements , data locality, interference between loads, and more.

2. Affinity and anti-affinity

1. Pods and Nodes

  1. Starting from the pod, it can be divided into affinity and anti-affinity, corresponding to podAffinity and podAntiAffinity respectively.
  2. Starting from node, it can also be divided into affinity and anti-affinity, corresponding to nodeAffinity and nodeAntiAffinity respectively .
  3. In terms of operation instructions, there can be ln, Notln, Exists, DoesNotExist and so on.

 For affinity, in means I want to schedule to the position with this label.
For anti-affinity, in means I don’t want to schedule to the position with this label

2. Hard Affinity and Soft Affinity

preferred DuringSchedulingIgnoredDuringExecution soft affinity  

Soft strategy: Combined with the following "operator: NotIn", it means try not to schedule the pod to the matched node, but if there are no unmatched nodes, it can also be scheduled to the matched node.
 

 required DuringSchedulingIgnoredDuringExecution  hard affinity

Hard strategy: Combined with the following "operator: In", it means that it must be scheduled to a node that meets the conditions, otherwise it will wait for Pending.
 

Either way, it ultimately depends on the label tag.

kubectl get pods -n company ai-action-statistic-gray-86465f9c4b-hdfk4 -oyaml | grep nodeSelector -B 5 -A 5
  uid: ed47f094-f70a-45ed-b7dd-d46f2d01986f
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:    #硬策略
        nodeSelectorTerms:
        - matchExpressions:
          - key: node-role.kubernetes.io/gray
            operator: In
            values:
            - gray
      preferredDuringSchedulingIgnoredDuringExecution:  #软策略
          - weight: 1
            preference:
              matchExpressions:
              - key: pc-app
                operator: NotIn
                values:
                - luna

3. Stain and Tolerance

One or more taints can be applied to each node of K8s, which means that those pods that cannot tolerate these taints will not be accepted by the node. If a toleration is applied to pods, it means that those pods can (but are not required to) be scheduled on nodes with the corresponding taint.

1. Taint

  1. If a node is marked as tainted, it means that pods are not allowed to be scheduled on the node unless the pod is also marked as tainted .
  2. When using the k8s cluster deployed by kubeadm, you should find that, under normal circumstances, the application will not be scheduled to the master node. Because the k8s cluster deployed by kubeadm adds Taints to the master node by default .

  

1.1 Composition of stains

Use the kubectl taint command to set a taint on a Node node. After the Node is tainted, there is a repulsive relationship with the Pod, which allows the Node to refuse the scheduling execution of the Pod, or even expel the Pod that already exists on the Node. go out.

The composition of each stain is as follows:

key=value:effect

Each taint has a key and a value as the label of the taint, where the value can be empty, and the effect describes the effect of the taint.

Taints have three strategies:

  1. PreferNoSchedule : The soft policy version of NoSchedule, which means not to schedule to the tainted node as much as possible.
  2. NoExecute : This option means that once Taint takes effect, if the Pod running in the node does not have a corresponding tolerance (Tolerate) setting, it will be evicted directly.
  3. NoSchedule : Indicates that k8s will not schedule the Pod to the Node with this taint

1.2 Stain setup and removal

Examples of commands to set and remove taints using kubectl are as follows:

# Set taint
kubectl taint nodes node1 key1=value1:NoSchedule
# Remove taint
kubectl taint nodes node1 key1:NoSchedule-
 

 Next, let’s look at a specific example. Using kubeadm to deploy and initialize a Kubernetes cluster, the master node is set with a node-role.kubernetes.io/master:NoSchedule taint, which can be viewed using the kubectl describe node command. This taint indicates that by default the master node will not schedule to run Pods, that is, will not run workloads. The commands to set up and remove this taint for clusters using binary manual deployment are as follows:

kubectl taint nodes <node-name> node-role.kubernetes.io/master=:NoSchedule
kubectl taint nodes <node-name> node-role.kubernetes.io/master:NoSchedule-

2. Tolerations

The Node with the taint set will have a mutually exclusive relationship with the Pod according to the effect of the taint: NoSchedule, PreferNoSchedule, NoExecute, and the Pod will not be scheduled to the Node to a certain extent. But we can set tolerance (Toleration) on the Pod, which means that the Pod with the tolerance set can tolerate the existence of taint and can be scheduled to the Node with taint .

2.1 Basic usage of Toleration

The key and effect in the pod's Toleration declaration need to be consistent with the Taint settings and meet one of the following conditions:

  1. The value of operator is Exists, no need to specify value at this time
  2. The value of operator is Equal and value is equal
  3. If no operator is specified, the default is Equal.

 There are also two special cases:

The NoExecute Taint effect has the following effects on running pods on a node:

  1. An empty key with the Exists operator can match all keys and values
  2. An empty effect matches all effects

    In the above example, the value of effect is NoSchedule. Here is a brief description of the value of effect:

  3. NoSchedule: If a pod does not declare to tolerate this Taint, the system will not schedule the Pod to the node with this Taint
  4. PreferNoSchedule: The soft limit version of NoSchedule. If a Pod does not declare to tolerate this Taint, the system will try to avoid scheduling the Pod to this node, but it is not mandatory.
  5. NoExecute: Defines pod eviction behavior in response to node failures.
  6. Pods without Toleration will be evicted immediately
  7. The pod corresponding to Toleration is configured, if no value is assigned to tolerationSeconds, it will always stay in this node
  8. If the pod corresponding to Toleration is configured and the value of tolerationSeconds is specified, it will be evicted after the specified time

 2.2 Toleration case

tolerations:
 - key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
tolerationSeconds: 3600
 - key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
 - key: "key2"
operator: "Exists"
effect: "NoSchedule" 

  • Among them, the key, vaule, and effect must be consistent with the taint set on the Node
  • The value of operator Exists will ignore the value value
  • tolerationSeconds is used to describe the time that the Pod can continue to run when the Pod needs to be evicted

Let's look at two special cases of setting tolerance on Pods:

Example 1: When no key value is specified, all tainted keys are tolerated:

tolerations:
- operator: "Exists"

 Example 2: When no effect value is specified, it means to tolerate all taint effects:

tolerations:
- key: "key"
operator: "Exists"

Note that in the event of a node failure, in order to maintain the existing speed limit setting for pod eviction, the system will gradually set Taint for the node in the speed limit mode, which can prevent the Consequences of a large number of pods being evicted. This feature is compatible with tolerationSeconds, allowing pods to define how long a node failure lasts before being evicted.

3. Multi-taint and multi-tolerant configuration

**The system allows multiple taints to be set on the same node, and multiple Tolerations can also be set on the pod. **Kubernetes scheduler handles multiple Taints and Tolerations that can match, and the remaining Taints that are not ignored are the effects on Pods. Here are a few special cases:

  1. If there is effect=NoSchedule in the remaining Taint, the scheduler will not schedule the pod to this node.
  2. If there is no NoSchedule effect in the remaining Taint, but there is a PreferNoSchedule effect, the scheduler will try not to assign the pod to this node
  3. If the effect of the remaining Taint is NoExecute, and the pod is already running on the node, it will be evicted; if it is not running on the node, it will not be scheduled to the node again.

kubectltaint nodes node1 key1=value1:NoSchedule  
kubectl taint nodes node1 key1=value1:NoExecute  
kubectl taint nodes node1 key2=value2:NoSchedule

Guess you like

Origin blog.csdn.net/weixin_71438279/article/details/127743154