概念

添加污点

kubectl taint nodes <node-name> key=value:NoSchedule

删除污点

kubectl taint nodes <node-name> key:NoSchedule-

下面定义的pod可以调度到上面的node：

tolerations:
- key: "key"
  operator: "Equal"
  value: "value"
  effect: "NoSchedule"

tolerations:
- key: "key"
  operator: "Exists"
  effect: "NoSchedule"

没有value的话operator是Exists，有value就是Equal

没有key的opeartor容忍一切

tolerations:
- operator: "Exists"

没有effect匹配key的所有effects

tolerations:
- key: "key"
  operator: "Exists"

三种 effect：

NoSchedule：不调度
PreferNoSchedule：尽量不调度
NoExecute：

一个节点可以有多个污点，一个pod可以有多个容忍。k8s处理的顺序像个过滤器：从全部污点开始，忽略与pod容忍匹配的污点，剩下的未忽略的污点指示影响pod的effect，通常：

如果有至少一个未忽略的NoSchedule，就不调度pod到该node
如果没有未忽略的effect是NoSchedule的污点，但是至少有一个effect是PreferNoSchedule的污点，就尽量不调度到该node
如果至少有一个effect是NoExecute的未忽略污点，pod将被从节点驱逐（如果它已经在该节点运行），并且不会被调度到该node（如果尚未在该node运行）

tolerationSeconds 指示节点添加NoExecute污点以后pod还能在该node待多久，如果在这段时间去除了污点，pod就不被驱逐了。

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600

用例

- 专用节点：通过与定制的admission controller配合，可以指定某些用户的专用节点

kubectl taint nodes nodename dedicated=groupName:NoSchedule

- 特定硬件的节点

kubectl taint nodes nodename special=true:NoSchedule

- 基于污点的驱逐

基于污点的驱逐

k8s 1.6+，node controller可以在特定条件为真时自动添加污点：

node.kubernetes.io/not-ready：NodeCondition的Ready为False
node.kubernetes.io/unreachable：NodeCondition的Ready为Unknown
node.kubernetes.io/out-of-disk：磁盘要用完了
node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/network-unavailable
node.kubernetes.io/unschedulable
node.cloudprovider.kubernetes.io/uninitialized

例如，一个pod不想因为节点网络问题而被立即驱逐，它希望等会网络能恢复：

tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

k8s的DefaultTolerationSeconds admission controller自动为pod添加node.kubernetes.io/not-ready:NoExecute for 300s和node.kubernetes.io/unreachable:NoExecute for 300s

daemonset的这两个污点没有tolerationSeconds，使ds的pod永不被驱逐。

根据Condition为节点打污点

节点生命周期控制器自动为节点条件创建对应的effect为NoSchedule的污点。scheduler不检查几点的condition而是检查污点。用户可以通过添加容忍来忽略节点的问题（通过node condition表示）

1.8+，DaemonSet控制器自动为所有pod添加NoSchedule容忍，避免它们被驱逐

node.kubernetes.io/memory-pressure
node.kubernetes.io/disk-pressure
node.kubernetes.io/out-of-disk (only for critical pods)
node.kubernetes.io/unschedulable (1.10 or later)
node.kubernetes.io/network-unavailable (host network only)

Kubernetes 污点和容忍

概念

用例

- 专用节点：通过与定制的admission controller配合，可以指定某些用户的专用节点

- 特定硬件的节点

- 基于污点的驱逐

基于污点的驱逐

根据Condition为节点打污点

猜你喜欢