Introduction to pod scheduling
In k8s,
kube-scheduler
the scheduling of pods is realized through components. The so-called scheduling means placing the pods to be created on the appropriate nodes. The general process is to select the most suitable Node for the pods in the pod list to be scheduled through调度算法
the调度策略
correspondingkubelet
sum .kube-schedule
Pod 绑定事件
There are generally
调度算法
two steps here, screening out candidate nodes, determining the optimal node , and determining the optimal node involves node scoring and so on.
调度策略
There are common Pod选择器、指定节点、主机亲和性
methods, and at the same time, nodescoedon
anddrain
tags need to be considered. What I shared with my friends today is调度策略
one, that is, throughPod拓扑分布约束
, to achieve跨集群节点均匀调度分布Pod
Why do you need to evenly schedule and distribute Pods across cluster nodes?
We know that in k8s, if you just want each node to evenly schedule and distribute a pod, you can use it
DaemonSet
to achieve it. If there are multiple pods, the topological distribution constraints of the pods are required to evenly schedule the pods to evenly distribute the pods in the cluster. Overbooking of nodes and overutilization of pods can be used as much as possible to achieve high availability and efficient cluster resource utilization.In k8s, Pod topology distribution constraints ( PodTopologySpread ) are used to evenly schedule pods. This feature has been stable since v1.19. Some properties added in v1.25, v1.1.26.
It should be noted that the uniform scheduling of pods here does not mean that only the pods that need to be scheduled are evenly scheduled on the working node, and the pods that existed before the current node are not considered, but the uniform scheduling based on the working nodes. That is to say, the so-called uniform scheduling distribution is based on the working nodes. Although the topological distribution constraints of pods are defined on pods.
How to evenly distribute pods across nodes
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
# 配置一个拓扑分布约束
topologySpreadConstraints:
- maxSkew: <integer>
minDomains: <integer> # 可选;自从 v1.25 开始成为 Beta
topologyKey: <string>
whenUnsatisfiable: <string>
labelSelector: <object>
matchLabelKeys: <list> # 可选;自从 v1.25 开始成为 Alpha
nodeAffinityPolicy: [Honor|Ignore] # 可选;自从 v1.26 开始成为 Beta
nodeTaintsPolicy: [Honor|Ignore] # 可选;自从 v1.26 开始成为 Beta
### 其他 Pod 字段置于此处
Example 1:
apiVersion: v1
kind: Pod
metadata:
name: example-pod
spec:
# 配置一个拓扑分布约束
topologySpreadConstraints:
- maxSkew: 1 # 以绝对均匀的方式分配 POD 即 pod 在节点分布的差值不能超过的值。
topologyKey: kubernetes.io/hostname #使用主机名这个标签作为拓扑域
whenUnsatisfiable: ScheduleAnyway #始终调度 pod,即使它不能满足 pod 的均匀分布
labelSelector: <object> #作用于匹配这个选择器的 Pod
### 其他 Pod 字段置于此处
apiVersion: apps/v1
kind: Deployment
metadata:
name: test
spec:
replicas: 10
template:
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: test
containers:
- name: pause
image: registry.aliyuncs.com/google_containers/pause:3.5
It should be noted that , 缩减 Deployment 并不能保证均匀分布
and may cause Pod distribution imbalance.
maxSkew
: Describes the degree to which these Pods are likely to be evenly distributed. You must specify this field and the value must be greater than zero. Its semantics will change with the value of whenUnsatisfiable: simply speaking, if the distribution is uniform, then the maximum pod difference between two nodes can be.
If you choose
whenUnsatisfiable: DoNotSchedule
, maxSkew defines the maximum allowed difference between the number of matching Pods in the target topology and the global minimum. For example, if you have 3 Availability Zones with 2, 2, and 1 matching Pods, then MaxSkew is set to 1, and the global minimum is 1.If you choose
whenUnsatisfiable: ScheduleAnyway
, the scheduler will favor topological domains that reduce skew values.
topologyKey
: is the key of the node label.此键标记并且具有相同的标签值
Nodes are considered to be medium if they use处于同一拓扑域
. we will拓扑域中(即键值对)的每个实例称为一个域
. The scheduler will try to run on each拓扑域中放置数量均衡
Pod. Additionally, we define eligible domains as domains whose nodes satisfynodeAffinityPolicy
andnodeTaintsPolicy
require. WhentopologyKey
the value isnone
.
whenUnsatisfiable
: Indicates what to do if the Pod does not satisfy the distribution constraints:
DoNotSchedule (default) tells the scheduler not to schedule.
ScheduleAnyway tells the scheduler to continue scheduling anyway, but to order the nodes according to how the deviation can be minimized.
labelSelector
: Used to find matching Pods. Pods matching this label will be counted to determine the number of Pods in the corresponding topology domainWhen Pods are defined
不止一个 topologySpreadConstraint
, there is a relationship between these constraints逻辑与
. The kube-scheduler will find a node for the new Pod that satisfies all constraints.
Multiple Topological Distribution Constraints
Before that, you need to do some preparatory work, and put a label on each working node disktype=node-group1
Add a new Pod and add two topological distribution constraints, requiring that the pod is in the topologyKey: kubernetes.io/hostname
topological domain and at the same time in topologyKey: disktype
the topological domain
kind: Pod
apiVersion: v1
metadata:
name: mypod
labels:
app: test
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: test
- maxSkew: 1
topologyKey: disktype
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: test
containers:
- name: pause
image: registry.aliyuncs.com/google_containers/pause:3.5
Conflicting Topological Distribution Constraints
If all nodes do not satisfy the pod's topology distribution constraints, the current pod scheduling will fail. The following creates a new patch file, modifies the number of copies to 5, and uses it kubernetes.io/hostname
as the topology domain
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: test
name: test
namespace: test-topo-namespace
spec:
replicas: 5
selector:
matchLabels:
app: test
template:
metadata:
labels:
app: test
spec:
containers:
- image: registry.aliyuncs.com/google_containers/pause:3.5
name: pause
topologySpreadConstraints:
- labelSelector:
matchLabels:
app: test
maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
Currently whenUnsatisfiable: DoNotSchedule
, when the constraints are not met, no scheduling will occur. The master has taints, and no scheduling will occur by default. Therefore, when the number of replicas is 5, each worker node will schedule one, and the remaining pod scheduling will violate wherever it goes, so pending occurs maxSkew: 1
.
Here we mainly learn and record how to achieve even scheduling and distribution of pods on nodes as much as possible.
References
https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/
https://medium.com/geekculture/kubernetes-distributing-pods-evenly-across-cluster-c6bdc9b49699