k8s调度器、预选策略及调度方式

一、k8s调度流程

1、(预选)先排除完全不符合pod运行要求的节点
2、(优先)根据一系列算法,算出node的得分,最高没有相同的,就直接选择
3、上一步有相同的话,就随机选一个

二、调度方式

1、node(运行在那些node上)
2、pod选择(当需要运行在某个pod在一个节点上(pod亲和性),或不要pod和某个pod运行在一起(pod反亲和性))
3、污点 (pod是否能容忍污点,能则能调度到该节点,不能容忍则无法调度到该节点,如果存在则驱离pod),可以定义容忍时间

三、常用的预选机制

调度器:
预选策略:(一部分)

CheckNodeCondition:#检查节点是否正常(如ip,磁盘等)
GeneralPredicates
	HostName:#检查Pod对象是否定义了pod.spec.hostname
	PodFitsHostPorts:#pod要能适配node的端口 pods.spec.containers.ports.hostPort(指定绑定在节点的端口上)
	MatchNodeSelector:#检查节点的NodeSelector的标签  pods.spec.nodeSelector
	PodFitsResources:#检查Pod的资源需求是否能被节点所满足
NoDiskConflict: #检查Pod依赖的存储卷是否能满足需求(默认未使用)
PodToleratesNodeTaints:#检查Pod上的spec.tolerations可容忍的污点是否完全包含节点上的污点;
PodToleratesNodeNoExecuteTaints:#不能执行(NoExecute)的污点(默认未使用)
CheckNodeLabelPresence:#检查指定的标签再上节点是否存在
CheckServiceAffinity:#将相同services相同的pod尽量放在一起(默认未使用)
MaxEBSVolumeCount: #检查EBS(AWS存储)存储卷的最大数量
MaxGCEPDVolumeCount #GCE存储最大数
MaxAzureDiskVolumeCount: #AzureDisk 存储最大数
CheckVolumeBinding: #检查节点上已绑定或未绑定的pvc
NoVolumeZoneConflict: #检查存储卷对象与pod是否存在冲突
CheckNodeMemoryPressure:#检查节点内存是否存在压力过大
CheckNodePIDPressure:  #检查节点上的PID数量是否过大
CheckNodeDiskPressure: #检查内存、磁盘IO是否过大
MatchInterPodAffinity:  #检查节点是否能满足pod的亲和性或反亲和性

  

四、常用的优选函数

LeastRequested:#空闲量越高得分越高
(cpu((capacity-sum(requested))*10/capacity)+memory((capacity-sum(requested))*10/capacity))/2
BalancedResourceAllocation:#CPU和内存资源被占用率相近的胜出;
NodePreferAvoidPods:  #节点注解信息“scheduler.alpha.kubernetes.io/preferAvoidPods”
TaintToleration:#将Pod对象的spec.tolerations列表项与节点的taints列表项进行匹配度检查,匹配条目越,得分越低;

SeletorSpreading:#标签选择器分散度,(与当前pod对象通选的标签,所选其它pod越多的得分越低)
InterPodAffinity:#遍历pod对象的亲和性匹配项目,项目越多得分越高
NodeAffinity: #节点亲和性 、
MostRequested: #空闲量越小得分越高,和LeastRequested相反 (默认未启用)
NodeLabel:    #节点是否存在对应的标签 (默认未启用)
ImageLocality:#根据满足当前Pod对象需求的已有镜像的体积大小之和(默认未启用)

  

五、高级调度设置方式

1、nodeSelector选择器

#查看标签
[root@k8s-m ~]# kubectl get  nodes --show-labels
NAME      STATUS    ROLES     AGE       VERSION   LABELS
k8s-m     Ready     master    120d      v1.11.2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=k8s-m,node-role.kubernetes.io/master=
node1     Ready     <none>    120d      v1.11.2   app=myapp,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=ssd,disktype=ssd,kubernetes.io/hostname=node1,test_node=k8s-node1

#使用nodeSelector选择器,选择disk=ssd的node


#查看
[root@k8s-m schedule]# kubectl  get pod  -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP            NODE      NOMINATED NODE
nginx-pod                1/1       Running             0          49s       10.244.1.92   node1     <none>
[root@k8s-m schedule]# cat my-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  nodeSelector:
    disk: ssd

#如果nodeSelector中指定的标签节点都没有,该pod就会处于Pending状态(预选失败)

  

2、affinity

2.1、nodeAffinity的preferredDuringSchedulingIgnoredDuringExecution (软亲和,选择条件匹配多的,就算都不满足条件,还是会生成pod)

#使用
[root@k8s-m schedule]# cat  my-affinity-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: test_node1 #标签键名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1标签的值
            - test1     #test_node1标签的值
        weight: 60 #匹配相应nodeSelectorTerm相关联的权重,1-100

##查看(不存在这个标签,但是还是创建bin运行了)
[root@k8s-m schedule]# kubectl  get pod  
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             1/1       Running             0          16s

  

2.2、requiredDuringSchedulingIgnoredDuringExecution (硬亲和,类似nodeSelector,硬性需求,如果不满足条件不会调度pod,都不满足则Pending)

[root@k8s-m schedule]# cat my-affinity-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: test_node1 #标签键名
            operator: In #In表示在
            values:
            - k8s-node1 #test_node1标签的值
            - test1     #test_node1标签的值

			
#查看(没有test_node1这个标签,所以会Pending)
[root@k8s-m schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          4s

  

六、pod的亲和与反亲和性

1、podAffinity:(让pod和某个pod处于同一地方(同一地方不一定指同一node节点,根据个人使用的标签定义))

#使用(让affinity-pod和my-pod1处于同一处)
[root@k8s-m schedule]# cat my-affinity-pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #标签键名,上面pod定义
            operator: In #In表示在
            values:
            - my-pod1 #app1标签的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod处于同一位置     #此pod应位于同一位置(亲和力)或不位于同一位置(反亲和力),与pods匹配指定名称空间中的labelSelector,其中co-located定义为在标签值为的节点上运行,key topologyKey匹配任何选定pod的任何节点在跑
#查看
[root@k8s-m schedule]# kubectl  get pod   -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP            NODE      NOMINATED NODE
affinity-pod             1/1       Running             0          54s       10.244.1.98   node1     <none>
my-pod1                  1/1       Running             0          54s       10.244.1.97   node1     <none>

  

2、podAntiAffinity(让pod和某个pod不处于同一node,和上面相反)

[root@k8s-m schedule]# cat  my-affinity-pod3.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: my-pod1
  labels: 
    app1: my-pod1
     
spec:
  containers:
  - name: my-pod1
    image: nginx
    ports:
    - name: http
      containerPort: 80
---
apiVersion: v1
kind: Pod
metadata:
  name: affinity-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: affinity-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  affinity:
    podAntiAffinity:  #就改了这里
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app1 #标签键名,上面pod定义
            operator: In #In表示在
            values:
            - my-pod1 #app1标签的值
        topologyKey: kubernetes.io/hostname #kubernetes.io/hostname的值一样代表pod不处于同一位置   

#查看(我自有一台node,所有是Pending状态)
[root@k8s-m schedule]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
affinity-pod             0/1       Pending             0          1m
my-pod1                  1/1       Running             0          1m

  

七、污点调度

taint的effect定义对Pod排斥效果:
NoSchedule:#仅影响调度过程,对现存的Pod对象不产生影响;
NoExecute:#既影响调度过程,也影响现在的Pod对象;不容忍的Pod对象将被驱逐;
PreferNoSchedule: #当没合适地方运行pod了,也会找地方运行pod

1、查看并管理污点

#查看node污点(Taints)
[root@k8s-m schedule]# kubectl  describe  node  k8s-m |grep Taints
Taints:             node-role.kubernetes.io/master:NoSchedule

[root@k8s-m schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             <none>

#管理污点taint
kubectl  taint node  -h

#打污点(给node打标签)
kubectl  taint    node node1 node-type=PreferNoSchedule:NoSchedule 
#查看
[root@k8s-m schedule]# kubectl  describe  node  node1 |grep Taints
Taints:             node-type=PreferNoSchedule:NoSchedule

  

2、使用污点

#创建pod
[root@k8s-m ~]# cat  mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80

#查看pod(Pinding了)
[root@k8s-m ~]# kubectl  get pod 
NAME                     READY     STATUS              RESTARTS   AGE
nginx-pod                0/1       Pending             0          32s

#不能容忍污点
[root@k8s-m ~]# kubectl  describe pod nginx-pod|tail  -1
  Warning  FailedScheduling  3s (x22 over 1m)  default-scheduler  0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.


###使用
[root@k8s-m ~]# cat mypod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels: 
    app: my-pod
     
spec:
  containers:
  - name: my-pod
    image: nginx
    ports:
    - name: http
      containerPort: 80
  tolerations: #容忍的污点
  - key: "node-type" #之前定义的污点名
    operator: "Equal" #Exists,如果node-type污点在,就能容忍,Equal精确
    value: "PreferNoSchedule" #污点值
    effect: "NoSchedule" #效果
    #tolerationSeconds: 3600  #如果被驱逐的话,容忍时间,只能是effect为tolerationSeconds或NoExecute定义

	
#查看(已经调度了)
[root@k8s-m ~]# kubectl  get pod  -o wide
NAME                     READY     STATUS              RESTARTS   AGE       IP             NODE      NOMINATED NODE
nginx-pod                1/1       Running             0          3m        10.244.1.100   node1     <none>

  

猜你喜欢

转载自www.cnblogs.com/zhangb8042/p/10203266.html