K8S of Taints (stain) and Tolerations (tolerance) - Preliminary;

Taint and Toleration
node affinity (described here), is a property of the pod (preference or mandatory requirement), it makes the pod is attracted to a particular class of node. Taint the contrary, it is capable of repelling a Class of the node-specific pod.

Taint and toleration cooperate, can be used to avoid inappropriate pod is assigned to a node. You may apply one or more taint on each node, which means that these can not tolerate those pod taint is not acceptable to the node. If toleration is applied on the pod, the pod can be represented by these (but not required) is dispatched to a node having a matching taint.

The concept
uses examples
based taint of expulsion
Add taint node status based on
the concept
that you can use the command to add a node to kubectl taint taint. such as,

kubectl taint nodes node1 key = value: NoSchedule
to add a node node1 taint, that the key is a key, value is a value, effect is NoSchedule. This means that only have a toleration of pod and the taint that match to be able to be assigned to this node node1. You can define toleration pod in PodSpec in. Toleration are created using the following two commands kubectl taint taint in the above example because it matches a pod if any have a toleration of which can be assigned to node1:

Want to delete the above command added taint, you can run:

kubectl taint nodes node1 key: NoSchedule-
You can set the tolerance for the container label PodSpec in. Stain "match" the two tags and tolerance above kubectl taint created, thus tolerate any container having a label which Pod can be scheduled onto "node1":

tolerations:

  • key: "key"
    operator: "Equal"
    value: "value"
    effect: "NoSchedule"
    tolerations:
  • key: "key"
    operator: "the Exists"
    Effect: "NoSchedule"
    a toleration taint and a "match" refers to a key and that they have the same effect, and:

If the operator is the Exists (toleration can not be specified at this time value), or
if the operator is Equal, they should be equal to the value
Note:
there are two special cases:

If a key is empty and toleration of the operator of the Exists, indicates that the toleration of an arbitrary key, value and effect match, i.e., the toleration can tolerate any taint.
tolerations:

  • operator: "Exists"
    If a toleration effect is empty, then the same key value with the effect that matches taint may be any value.
    tolerations:
  • Key: "Key"
    operator: "the Exists"
    a value to use the example of the effect of the above NoSchedule, you can also use another value PreferNoSchedule. This is the "optimization" or "soft" version of NoSchedule - the system will try to avoid the pod scheduled to present on the node that can not tolerate taint, but this is not mandatory. value may also be set to effect NoExecute, this value is described in detail below.

You can add multiple taint to a node, you can give a pod add more than toleration. A plurality of processing procedure Kubernetes taint and toleration like a filter: all taint from traversing a node, filter out taint presence matched toleration of the pod. The remaining unfiltered effect determines whether the value of taint pod will be assigned to the node, in particular the following:

If more than one value NoSchedule effect of the presence of unfiltered taint taint in the pod Kubernetes will not be assigned to that node.
If the effect is NoSchedule absence of taint in unfiltered taint, but there is PreferNoSchedule taint effect, the Kubernetes will attempt to assign to the node pod.
If unfiltered taint is present in more than one effect of NoExecute taint, the pod does not Kubernetes assigned to the node (if not already running on the node pod), or the expulsion of the pod from the node (if already in the pod running on the node).
For example, suppose you add a node to a following taint

Nodes key1 = node1 taint kubectl VALUE1: NoSchedule
kubectl taint Nodes key1 = node1 VALUE1: NoExecute
kubectl taint Nodes key2 = value2 node1: NoSchedule
There is then a pod, it has two toleration

tolerations:

  • key: "key1"
    operator: "Equal"
    value: "value1"
    effect: "NoSchedule"
  • Key: "key1"
    operator: "Equal"
    value: "VALUE1"
    Effect: "NoExecute"
    In this example, the pod is not assigned to said node, because it does not taint toleration and third match. However, if the pod has been running at it said node prior to the addition of the taint to the node, then it can continue to run on that node, because the third taint is only three taint can not be tolerated in this pod.

Typically, if a node is added to the value of an effect NoExecute of taint, I can not stand this any taint of the pod will immediately be expelled, can endure any taint of this pod will not be deported. However, if the pod there is an effect of toleration NoExecute value specifies the value of the optional attribute tolerationSeconds, then added to the node after the above taint, time pod can continue to run on the node. E.g,

tolerations:

  • Key: "key1"
    operator: "Equal"
    value: "value1"
    Effect: "NoExecute"
    tolerationSeconds: 3600
    This means that if the pod is running, taint then a match in which they are added to the node, then the pod will continue in 3600 seconds to run on the node, then deported. If prior to this above taint is deleted, the pod will not be deported.

Examples of use
by taint and toleration, the flexibility to allow pod to avoid certain nodes or pod expelled from some nodes. Here are a few examples of use:

Dedicated node: If you want to dedicate certain node to a specific group of users, you can add a taint to the nodes (ie, kubectl taint nodes nodename dedicated = groupName: NoSchedule), and then to add this group of users pod a corresponding toleration (admission controller by writing a custom, it is easy to do). Has the above-described toleration pod can be assigned to the dedicated node, but can also be assigned to other nodes in the cluster. If you want these pod can only be assigned to the dedicated node, you will also need to add additional nodes to these dedicated and above taint a similar label (for example: dedicated = groupName), but also to an increase in the pod in the above admission controller node affinity pod above requirements can only be added to the node assigned dedicated = groupName tag.
Equipped with special hardware nodes: in some of the nodes are equipped with special hardware (such as GPU) cluster, we do not need this type of hardware pod hope not to be assigned to these special nodes to reserve resources for subsequent pod in need of such hardware . To achieve this purpose, are equipped with special hardware can give added taint node (e.g. kubectl taint nodes nodename special = true: NoSchedule or kubectl taint nodes nodename special = true: PreferNoSchedule), and to the use of such special hardware add pod a matching toleration. Examples of specific nodes and the like, the easiest way to add this toleration is to use a custom admission controller. For example, we recommend using the Extended Resources to represent special hardware to configure the extended resource when you add taint name contains special hardware nodes, and then run a ExtendedResourceToleration admission controller. In this case, because the node has been taint, no corresponding Pod toleration will be scheduled to these nodes. But when you create a Pod extended resource use of, ExtendedResourceToleration admission controller will automatically give Pod with the right toleration, so Pod will be scheduled to automatically configure these special pieces of hardware nodes. This makes it possible to ensure that these special hardware configuration node is dedicated to running Pod need to use the hardware, and you do not need to manually add these Pod toleration.
Based taint of expulsion (beta characteristics): It is the evictions when a node configuration problem in each pod, the following sections describe this feature
based on taint of expulsion
We mentioned earlier value NoExecute taint effect, it will affect the pod has been running on a node if the pod can not stand the effect is NoExecute of taint, then the pod will immediately be expelled if the pod can tolerate effect is NoExecute of taint, but tolerationSeconds not specified in the definition of toleration, the pod will be running on that node. * If the pod can tolerate effect is NoExecute of taint, and specify tolerationSeconds, the pod can continue to run the length of time specified in this node.

In addition, Kubernetes 1.6 support has been expressed (alpha stage) node problem. In other words, when a certain condition is true, node controller automatically adds a node to taint. Currently built into the taint include:

node.kubernetes.io/not-ready: node is not ready. This corresponds to the node Ready state value "False".
node.kubernetes.io/unreachable:node controller not access nodes. This corresponds to a value of node status Ready "Unknown".
node.kubernetes.io/out-of-disk: node disk exhausted.
there is memory pressure node: node.kubernetes.io/memory-pressure.
there is pressure disk node: node.kubernetes.io/disk-pressure.
node.kubernetes.io/network-unavailable: node network is unavailable.
node.kubernetes.io/unschedulable: node is not scheduled.
node.cloudprovider.kubernetes.io/uninitialized: If you specify a start kubelet "external" cloud provider, it will add to the current node will be marked by a taint unavailable. After a controller initialization cloud-controller-manager of this node, kubelet will remove the taint.
In version 1.13, TaintBasedEvictions function has been upgraded to Beta, and is enabled by default, so the stain will be automatically added to the node such taint the logic pod expulsion based on node status Ready is disabled.

Note:
Note: In order to ensure the pod due to the problems caused by the expulsion of the node rate limiting behavior is normal, the system will actually add taint to rate-limited manner. In like master and node communication interruption scene, which avoids the pod is a large number of eviction.

With this beta feature, combined with tolerationSeconds, pod you can specify when a node appears one or all of the above problems will also run on this node how long.

For example, an application that uses a lot of local state when disconnected from the network, still want to run on the current node to stay for a longer period of time, willing to wait for the network to recover in order to avoid deportation. In this case, pod toleration of this might be the following:

tolerations:

  • key: "node.kubernetes.io/unreachable"
    operator: "the Exists"
    Effect: "NoExecute"
    tolerationSeconds: 6000
    note, Kubernetes will automatically add a key to the pod and the configuration tolerationSeconds to toleration node.kubernetes.io/not-ready = 300, unless the pod configuration provided by the user has already existed for the toleration node.kubernetes.io/not-ready the key. Similarly, Kubernetes pod will add a key to node.kubernetes.io/unreachable of toleration and configure tolerationSeconds = 300, unless the pod configuration provided by the user already exists in the key for the node.kubernetes.io/unreachable of toleration.

This mechanism ensures that toleration is automatically added in one problem is detected pod default can continue to stay in the running of the current node 5 minutes. The two default toleration is added by DefaultTolerationSeconds of admission controller.

DaemonSet when the pod is created, will not specify tolerationSeconds NoExecute of toleration for the following taint automatically added:

node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
This ensures DaemonSet the pod will never be expelled when the problems mentioned above, and this behavior after TaintBasedEvictions this feature is disabled is the same.

Based on the node status is added taint
in version 1.12, TaintNodesByCondition function has been upgraded to Beta, the node Lifecycle Controller and Node automatically creates the conditions corresponding to the stain. Similarly, the scheduler does not check node condition. But the scheduler checks stain. This ensures that the conditions do not affect the content of the nodes on the node scheduled. The user can choose to ignore some of the questions Node (expressed as scheduling Node conditions) by adding the appropriate Pod tolerance. Note, TaintNodesByCondition pollution will have a node NoSchedule set. NoExecute effect by the TaintBasedEviction control, TaintBasedEviction Beta version feature is enabled by default since version 1.13.

node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/out-of-disk (only for POD Critical)
node.kubernetes.io/unschedulable (version 1.10 or later)
the Node .kubernetes.io / network-unavailable (only for host network)
adding the toleration ensure backward compatibility, you can also choose to add toleration of freedom to DaemonSet.

Document Source:
https://kubernetes.io/zh/docs/concepts/configuration/taint-and-toleration/

Guess you like

Origin blog.51cto.com/breaklinux/2445611