Node affinity (affinity), is an attribute node, so that qualified pro pod attached to it (or prefer a hard requirement). Stain is an act contrary, it will make the pod resist this node (ie pod scheduling when this node is not scheduled)
Stains and easily bonded together to ensure that the pod is not scheduled to be unsuitable for the node. When one or more of the stain added to a node, it means that the node does not accept any of these stains pod intolerance. Tolerations
Role on the pod, allowing (but not necessarily) scheduled to be tainted POD (taint) in line with the node
concept
You may be used kubectl taint
to add stains (taint) to a node (node), for example:
kubectl taint nodes node1 key=value:NoSchedule
This put the key is key
, the value value
, the effect is NoSchedule
the stain is added to a node node1
on the so unless there is tolerance pod (toleration) compliant, otherwise they would not be scheduled on this node
You can delete just added the following commandtaint
kubectl taint nodes node1 key:NoSchedule-
You can specify the creation of yml in a pod on toleration
PodSpec, the two 容忍
will create the match in front taint
, so any one of them will create the pod is scheduled to nodes node1
on
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"
tolerations:
- key: "key"
operator: "Exists"
effect: "NoSchedule"
Only the pod key
and effect
are a stain and a key match with effect, it was considered a match, and to comply with the following scenario:
operator
IsExists
(value should not be specified in this case)operator
It isEqual
andvalue
the same
If operator
not specified, the default isEqual
The following two cases as a special case:
1) If the
key
empty (no specified key means, rather than an empty string key),operator
toExists
the matching of allkey
,value
andeffect
also matches any Node i.e.,
tolerations:
- operator: "Exists"
2) Empty
effect
matches alleffect
tolerations:
- key: "key"
operator: "Exists"
All the above will match key
all taint of key nodes
The previous example of the use of NoSchedule
the type of effect
addition may be used PreferNoSchedule
type effect
, which is one 优先选择
or 软性
version NoSchedule
, the scheduling system will try to avoid scheduling this stain intolerance pod to a node with this stain, but not hard the third requirement. effect
type: NoExecute
will be mentioned later
You can add a node to a plurality of stain (node), may be added to a plurality of tolerance (toleration) of a plurality of processing pod .kubernetes stains (taint) or a plurality of tolerance (toleration) similar filters: initially containing all stains and then ignored pod match the stain, and the rest can not be ignored stain effect of this decision node pod, in particular:
1) If there is at least a non-negligible NoSchedule
effect of the type (effect), kubernetes up to this node is not dispatched pod.
2) If you can not ignore is not NoSchedule
the type of effect (effect), but at least one PreferNoSchedule
type of effect, the kubernetes attempts on this node scheduling pod
3) If there is at least one NoExecute
type of effect (Effect), the pod is expelled from this node (of course, that this pod on this node), and if not pod on this node, this node will not be scheduled on
Refers to the so-called pod is expelled from the removed node to other nodes scheduled
Example, if you have one of the following types of nodes
kubectl taint nodes node1 key1=value1:NoSchedule
kubectl taint nodes node1 key1=value1:NoExecute
kubectl taint nodes node1 key2=value2:NoSchedule
And the following types of pod
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
In this case, pod will not be scheduled on node1, because there is no tolerance (toleration) to match the third taint
. But if it is running on this node, it can continue to run on this node, because it just does not match third taint
. (third taint and the effect is NoSchedule
to indicate to the node not to be scheduled)
Under normal circumstances, the effect of a type NoExecute
of taint
post is added to a node, all this does not tolerate taint
the pod will be immediately expelled, never be tolerated expelled. But the effect type NoExecute
can specify a tolerationSeconds
field to indicate when NoExecute
the effect type the stain is added to the node later, pod can continue in retirement on this node within the specified time.
E.g:
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 3600
It means that if pod running on a node can be tolerated, it can continue to run 3600 seconds, then expelled, if the stain is removed, the pod will not be expelled over the time period.
Above can be understood as a conditional tolerance, can not even match has been run on this node can only run on this node in the qualifying period.
Example:
taint
And toleration
it can be very flexible scheduling to the nodes not to inappropriate or existing node indicating pod pod dislodged, the use cases include the following:
- Dedicated node if you want some specific nodes for a particular user, you can add stain these nodes, e.g. :(
kubectl taint nodes nodename dedicated=groupName:NoSchedule
), then add tolerance (toleration) pod dedicated to these nodes, the node stain tolerance is allowed pod scheduled to node, of course, can also be scheduled on other nodes in the cluster (no taint of nodes, there is taint must tolerate). If you only want to pod is scheduled to specific nodes, you need to add in front of (the use of labels mentioned too the affinity property)
pod pod affinity is centered, and the stain of the node is the center node. want to make the pod is scheduled to a specific node requires affinity attribute, the node you want to exclude non-specific pod, you need to use
taint
, use special affinity and stains can guarantee a specific node is a dedicated pod, pod use only the specific nodes
With special hardware nodes in a cluster, some nodes comprise special hardware (e.g., special the GPU), the ideal situation is to allow the pod does not require special hardware is not scheduled to these nodes to nodes that may require special hardware retention space, can be used a method of adding the stain (taint) to the specified node effect is achieved in this case (for example
kubectl taint nodes nodename special=true:NoSchedule or kubectl taint nodes nodename special=true:PreferNoSchedule
), then you need to add special hardware pod tolerance (toleration) compliant.Based taint of expelled strategy (test function), when a node problems, the intolerance pod expelled.
Based expelled strategy taint of
As mentioned earlier NoExecute
effect type taint, it has an effect on pod exist on this node:
This taint of intolerance pod will be expelled immediately
This taint tolerated but did not specify
tolerationSeconds
the pod will always run in this nodeThis taint tolerated but includes
tolerationSeconds
node attribute specified time will be retained on this node (although tolerated, but is conditional, only to endure a period of time content)
The third point is the implication of even tolerable, but after more than tolerable time will still be expelled
In addition, kubernetes 1.6 introduces a showcase of node problem. That is when satisfied certain conditions, the node controller is automatically added to eligible node taint
, the following are some built-intaint
node.kubernetes.io/not-ready , not ready node, the corresponding node status
Ready
is falsenode.kubernetes.io/unreachable , can not reach the node controller node, the corresponding node status
ready
isUnknown
node.kubernetes.io/out-of-disk , insufficient disk space
node.kubernetes.io/memory-pressure present memory pressure, the node
node.kubernetes.io/disk-pressure there is no pressure, node disk
node.kubernetes.io/network-unavailable , the node network is not available
node.kubernetes.io/unschedulable , the node can not be scheduled
node.cloudprovider.kubernetes.io/uninitialized
In kubernetes 1.13 version, based stain expelled strategy to enhance the level of beta and is enabled by default, so the stain is automatically added node controller (kubelete), and the common node to the Ready
state-based evictions policy is disabled.
This beta feature, plus tolerationSeconds
allows nodes to specify the program can still be retained for a long time even if there are one or more matching problem
For example: the application contains a variety of local state still retained hope a little time, expected within the period specified in the network can be restored to normal in order to avoid being expelled when the network node split situation occurs tolerate pod layout of this node in this case. as follows
tolerations:
- key: "node.kubernetes.io/unreachable"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 6000
Note that if the user does not specify pod configuration node.kubernetes.io/not-ready
, will automatically configure the kubernetes add to pod node.kubernetes.io/not-ready tolerationSeconds=300
property. Likewise, if not configured, it is automatically addednode.kubernetes.io/unreachable tolerationSeconds=300
DaemonSet
Type of pod automatically add the following two types of taint when you create NoExecute
the type of effect and there is notolerationSeconds
node.kubernetes.io/unreachable
node.kubernetes.io/not-ready
This ensures that even if a node problems, DaemonSet
will not be expelled.
Conditional node taint
In kubernetes version 1.12, conditional add node taint (TaintNodesByCondition) wherein promoted beta levels, the node controller Lifecycle taint automatically added in accordance with state of the node to node. Likewise scheduler does not detect the state of the node, but detects node stains (taint). this ensures that the node status does not affect the pod which can be scheduled on this node, the user can choose to ignore the problem specified node (by state node reflected) by adding the appropriate tolerance (toleration). Note TaintNodesByCondition
Add only NoSchedule
the type of stain. NoExecute
effect type of TaintBasedEviction
control (beta this feature of version 1.13)
From kubernetes 1.8, DaemonSet controller
automatically following types for all the daemon
added NoSchedule
effect of the type of tolerance (toleration), to prevent the split DeamonSet
node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/out-of-disk (only for critical pods)
node.kubernetes.io/unschedulable (1.10 or later)
node.kubernetes.io/network-unavailable (host network only)
Adding these types of tolerance for backward compatibility, you can add any type of tolerance is DaemonSet