kubernetes scheduling of stains (taint) and tolerance (toleration)

Series catalog

Node affinity (affinity), is an attribute node, so that qualified pro pod attached to it (or prefer a hard requirement). Stain is an act contrary, it will make the pod resist this node (ie pod scheduling when this node is not scheduled)

Stains and easily bonded together to ensure that the pod is not scheduled to be unsuitable for the node. When one or more of the stain added to a node, it means that the node does not accept any of these stains pod intolerance. TolerationsRole on the pod, allowing (but not necessarily) scheduled to be tainted POD (taint) in line with the node

concept

You may be used kubectl taintto add stains (taint) to a node (node), for example:

kubectl taint nodes node1 key=value:NoSchedule

This put the key is key, the value value, the effect is NoSchedulethe stain is added to a node node1on the so unless there is tolerance pod (toleration) compliant, otherwise they would not be scheduled on this node

You can delete just added the following commandtaint

kubectl taint nodes node1 key:NoSchedule-

You can specify the creation of yml in a pod on tolerationPodSpec, the two 容忍will create the match in front taint, so any one of them will create the pod is scheduled to nodes node1on

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"
tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"

Only the pod keyand effectare a stain and a key match with effect, it was considered a match, and to comply with the following scenario:

  • operatorIs Exists(value should not be specified in this case)
  • operatorIt is Equaland valuethe same

If operatornot specified, the default isEqual

The following two cases as a special case:

1) If the keyempty (no specified key means, rather than an empty string key), operatorto Existsthe matching of all key, valueand effectalso matches any Node i.e.,

tolerations:
- operator: "Exists"

2) Empty effectmatches alleffect

tolerations:
- key: "key"
  operator: "Exists"

All the above will match keyall taint of key nodes

The previous example of the use of NoSchedulethe type of effectaddition may be used PreferNoScheduletype effect, which is one 优先选择or 软性version NoSchedule, the scheduling system will try to avoid scheduling this stain intolerance pod to a node with this stain, but not hard the third requirement. effecttype: NoExecutewill be mentioned later

You can add a node to a plurality of stain (node), may be added to a plurality of tolerance (toleration) of a plurality of processing pod .kubernetes stains (taint) or a plurality of tolerance (toleration) similar filters: initially containing all stains and then ignored pod match the stain, and the rest can not be ignored stain effect of this decision node pod, in particular:

1) If there is at least a non-negligible NoScheduleeffect of the type (effect), kubernetes up to this node is not dispatched pod.

2) If you can not ignore is not NoSchedulethe type of effect (effect), but at least one PreferNoScheduletype of effect, the kubernetes attempts on this node scheduling pod

3) If there is at least one NoExecutetype of effect (Effect), the pod is expelled from this node (of course, that this pod on this node), and if not pod on this node, this node will not be scheduled on

Refers to the so-called pod is expelled from the removed node to other nodes scheduled

Example, if you have one of the following types of nodes

kubectl taint nodes node1 key1=value1:NoSchedule
kubectl taint nodes node1 key1=value1:NoExecute
kubectl taint nodes node1 key2=value2:NoSchedule

And the following types of pod

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"

In this case, pod will not be scheduled on node1, because there is no tolerance (toleration) to match the third taint. But if it is running on this node, it can continue to run on this node, because it just does not match third taint. (third taint and the effect is NoScheduleto indicate to the node not to be scheduled)

Under normal circumstances, the effect of a type NoExecuteof taintpost is added to a node, all this does not tolerate taintthe pod will be immediately expelled, never be tolerated expelled. But the effect type NoExecutecan specify a tolerationSecondsfield to indicate when NoExecutethe effect type the stain is added to the node later, pod can continue in retirement on this node within the specified time.

E.g:

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600

It means that if pod running on a node can be tolerated, it can continue to run 3600 seconds, then expelled, if the stain is removed, the pod will not be expelled over the time period.

Above can be understood as a conditional tolerance, can not even match has been run on this node can only run on this node in the qualifying period.

Example:

taintAnd tolerationit can be very flexible scheduling to the nodes not to inappropriate or existing node indicating pod pod dislodged, the use cases include the following:

  • Dedicated node if you want some specific nodes for a particular user, you can add stain these nodes, e.g. :( kubectl taint nodes nodename dedicated=groupName:NoSchedule), then add tolerance (toleration) pod dedicated to these nodes, the node stain tolerance is allowed pod scheduled to node, of course, can also be scheduled on other nodes in the cluster (no taint of nodes, there is taint must tolerate). If you only want to pod is scheduled to specific nodes, you need to add in front of (the use of labels mentioned too the affinity property)

pod pod affinity is centered, and the stain of the node is the center node. want to make the pod is scheduled to a specific node requires affinity attribute, the node you want to exclude non-specific pod, you need to use taint, use special affinity and stains can guarantee a specific node is a dedicated pod, pod use only the specific nodes

  • With special hardware nodes in a cluster, some nodes comprise special hardware (e.g., special the GPU), the ideal situation is to allow the pod does not require special hardware is not scheduled to these nodes to nodes that may require special hardware retention space, can be used a method of adding the stain (taint) to the specified node effect is achieved in this case (for example kubectl taint nodes nodename special=true:NoSchedule or kubectl taint nodes nodename special=true:PreferNoSchedule), then you need to add special hardware pod tolerance (toleration) compliant.

  • Based taint of expelled strategy (test function), when a node problems, the intolerance pod expelled.

Based expelled strategy taint of

As mentioned earlier NoExecuteeffect type taint, it has an effect on pod exist on this node:

  • This taint of intolerance pod will be expelled immediately

  • This taint tolerated but did not specify tolerationSecondsthe pod will always run in this node

  • This taint tolerated but includes tolerationSecondsnode attribute specified time will be retained on this node (although tolerated, but is conditional, only to endure a period of time content)

The third point is the implication of even tolerable, but after more than tolerable time will still be expelled

In addition, kubernetes 1.6 introduces a showcase of node problem. That is when satisfied certain conditions, the node controller is automatically added to eligible node taint, the following are some built-intaint

  • node.kubernetes.io/not-ready , not ready node, the corresponding node status Readyis false

  • node.kubernetes.io/unreachable , can not reach the node controller node, the corresponding node status readyisUnknown

  • node.kubernetes.io/out-of-disk , insufficient disk space

  • node.kubernetes.io/memory-pressure present memory pressure, the node

  • node.kubernetes.io/disk-pressure there is no pressure, node disk

  • node.kubernetes.io/network-unavailable , the node network is not available

  • node.kubernetes.io/unschedulable , the node can not be scheduled

  • node.cloudprovider.kubernetes.io/uninitialized

In kubernetes 1.13 version, based stain expelled strategy to enhance the level of beta and is enabled by default, so the stain is automatically added node controller (kubelete), and the common node to the Readystate-based evictions policy is disabled.

This beta feature, plus tolerationSecondsallows nodes to specify the program can still be retained for a long time even if there are one or more matching problem

For example: the application contains a variety of local state still retained hope a little time, expected within the period specified in the network can be restored to normal in order to avoid being expelled when the network node split situation occurs tolerate pod layout of this node in this case. as follows

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

Note that if the user does not specify pod configuration node.kubernetes.io/not-ready, will automatically configure the kubernetes add to pod node.kubernetes.io/not-ready tolerationSeconds=300property. Likewise, if not configured, it is automatically addednode.kubernetes.io/unreachable tolerationSeconds=300

DaemonSetType of pod automatically add the following two types of taint when you create NoExecutethe type of effect and there is notolerationSeconds

  • node.kubernetes.io/unreachable

  • node.kubernetes.io/not-ready

This ensures that even if a node problems, DaemonSetwill not be expelled.

Conditional node taint

In kubernetes version 1.12, conditional add node taint (TaintNodesByCondition) wherein promoted beta levels, the node controller Lifecycle taint automatically added in accordance with state of the node to node. Likewise scheduler does not detect the state of the node, but detects node stains (taint). this ensures that the node status does not affect the pod which can be scheduled on this node, the user can choose to ignore the problem specified node (by state node reflected) by adding the appropriate tolerance (toleration). Note TaintNodesByConditionAdd only NoSchedulethe type of stain. NoExecuteeffect type of TaintBasedEvictioncontrol (beta this feature of version 1.13)

From kubernetes 1.8, DaemonSet controllerautomatically following types for all the daemonadded NoScheduleeffect of the type of tolerance (toleration), to prevent the split DeamonSet

  • node.kubernetes.io/memory-pressure

  • node.kubernetes.io/disk-pressure

  • node.kubernetes.io/out-of-disk (only for critical pods)

  • node.kubernetes.io/unschedulable (1.10 or later)

  • node.kubernetes.io/network-unavailable (host network only)

Adding these types of tolerance for backward compatibility, you can add any type of tolerance is DaemonSet

Guess you like

Origin www.cnblogs.com/tylerzhou/p/11026364.html