K8S pod uniform scheduling allocation - the road to dream building

Introduction to pod scheduling

  In k8s,  kube-scheduler the scheduling of pods is realized through components. The so-called scheduling means placing the pods to be created on the appropriate nodes. The general process is to select the most suitable Node for the pods in the pod list to be scheduled   through  调度算法 the  调度策略corresponding  kubelet sum  .kube-schedulePod 绑定事件

There are generally  调度算法 two steps here, screening out candidate nodes, determining the optimal node , and determining the optimal node involves node scoring and so on.

调度策略 There are  common Pod  选择器、指定节点、主机亲和性methods, and at the same time, nodes  coedonand draintags need to be considered. What I shared with my friends today is  调度策略one, that is, through  Pod拓扑分布约束 , to achieve 跨集群节点均匀调度分布Pod

Why do you need to evenly schedule and distribute Pods across cluster nodes? 

   We know that in k8s, if you just want each node to evenly schedule and distribute a pod, you can use it  DaemonSet to achieve it. If there are multiple pods, the topological distribution constraints of the pods are required to evenly schedule the pods to evenly distribute the pods in the cluster. Overbooking of nodes and overutilization of pods can be used as much as possible to achieve high availability and efficient cluster resource utilization.

  In k8s, Pod topology distribution constraints ( PodTopologySpread ) are used to evenly schedule pods. This feature has been stable since v1.19. Some properties added in v1.25, v1.1.26.

It should be noted that the uniform scheduling of pods here does not mean that only the pods that need to be scheduled are evenly scheduled on the working node, and the pods that existed before the current node are not considered, but the uniform scheduling based on the working nodes. That is to say, the so-called uniform scheduling distribution is based on the working nodes. Although the topological distribution constraints of pods are defined on pods.

How to evenly distribute pods across nodes

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  # 配置一个拓扑分布约束
  topologySpreadConstraints:
    - maxSkew: <integer>
      minDomains: <integer> # 可选;自从 v1.25 开始成为 Beta
      topologyKey: <string>
      whenUnsatisfiable: <string>
      labelSelector: <object>
      matchLabelKeys: <list> # 可选;自从 v1.25 开始成为 Alpha
      nodeAffinityPolicy: [Honor|Ignore] # 可选;自从 v1.26 开始成为 Beta
      nodeTaintsPolicy: [Honor|Ignore] # 可选;自从 v1.26 开始成为 Beta
  ### 其他 Pod 字段置于此处

Example 1:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  # 配置一个拓扑分布约束
  topologySpreadConstraints:
    - maxSkew: 1 # 以绝对均匀的方式分配 POD 即 pod 在节点分布的差值不能超过的值。
      topologyKey: kubernetes.io/hostname #使用主机名这个标签作为拓扑域
      whenUnsatisfiable: ScheduleAnyway #始终调度 pod,即使它不能满足 pod 的均匀分布
      labelSelector: <object> #作用于匹配这个选择器的 Pod
  ### 其他 Pod 字段置于此处
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  replicas: 10
  template:
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname  
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: test
      containers:
      - name: pause
        image: registry.aliyuncs.com/google_containers/pause:3.5

 It should be noted that , 缩减 Deployment 并不能保证均匀分布and may cause Pod distribution imbalance.

maxSkew : Describes the degree to which these Pods are likely to be evenly distributed. You must specify this field and the value must be greater than zero. Its semantics will change with the value of whenUnsatisfiable: simply speaking, if the distribution is uniform, then the maximum pod difference between two nodes can be.

  • If you choose  whenUnsatisfiable: DoNotSchedule, maxSkew defines the maximum allowed difference between the number of matching Pods in the target topology and the global minimum. For example, if you have 3 Availability Zones with 2, 2, and 1 matching Pods, then MaxSkew is set to 1, and the global minimum is 1.

  • If you choose  whenUnsatisfiable: ScheduleAnyway, the scheduler will favor topological domains that reduce skew values.

topologyKey : is the key of the node label. 此键标记并且具有相同的标签值Nodes are considered to be medium if they use 处于同一拓扑域. we will 拓扑域中(即键值对)的每个实例称为一个域. The scheduler will try to run on each 拓扑域中放置数量均衡Pod. Additionally, we define eligible domains as domains whose nodes satisfy  nodeAffinityPolicy and  nodeTaintsPolicy require. When  topologyKey the value is  none .

whenUnsatisfiable: Indicates what to do if the Pod does not satisfy the distribution constraints:

  • DoNotSchedule (default) tells the scheduler not to schedule.

  • ScheduleAnyway tells the scheduler to continue scheduling anyway, but to order the nodes according to how the deviation can be minimized.

labelSelector: Used to find matching Pods. Pods matching this label will be counted to determine the number of Pods in the corresponding topology domain

When Pods are defined 不止一个 topologySpreadConstraint, there is a relationship between these constraints 逻辑与. The kube-scheduler will find a node for the new Pod that satisfies all constraints.

Multiple Topological Distribution Constraints

Before that, you need to do some preparatory work, and put a label on each working node disktype=node-group1

Add a new Pod and add two topological distribution constraints, requiring that the pod is in the  topologyKey: kubernetes.io/hostname topological domain and at the same time in  topologyKey: disktype the topological domain

kind: Pod
apiVersion: v1
metadata:
  name: mypod
  labels:
    app: test
spec:
  topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: kubernetes.io/hostname
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: test
  - maxSkew: 1
    topologyKey: disktype
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: test
  containers:
  - name: pause
    image: registry.aliyuncs.com/google_containers/pause:3.5

Conflicting Topological Distribution Constraints

If all nodes do not satisfy the pod's topology distribution constraints, the current pod scheduling will fail. The following creates a new patch file, modifies the number of copies to 5, and uses it  kubernetes.io/hostname as the topology domain

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: test
  name: test
  namespace: test-topo-namespace
spec:
  replicas: 5
  selector:
    matchLabels:
      app: test
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - image: registry.aliyuncs.com/google_containers/pause:3.5
        name: pause
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: test
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule

 Currently  whenUnsatisfiable: DoNotSchedule, when the constraints are not met, no scheduling will occur. The master has taints, and no scheduling will occur by default. Therefore, when the number of replicas is 5, each worker node will schedule one, and the remaining pod scheduling will violate wherever it goes, so pending occurs  maxSkew: 1 .

Here we mainly learn and record how to achieve even scheduling and distribution of pods on nodes as much as possible.

References

https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/pod-priority-preemption/

https://medium.com/geekculture/kubernetes-distributing-pods-evenly-across-cluster-c6bdc9b49699

Guess you like

Origin blog.csdn.net/qq_34777982/article/details/131789598