Docker (20)--Docker k8s--Kubernetes storage--kubernetes scheduling

1 Introduction

  • The scheduler uses the watch mechanism of kubernetes to discover the newly created pods in the cluster that have not yet been scheduled to the Node. The scheduler will schedule each unscheduled Pod found to an appropriate Node to run.

  • kube-scheduler is the default scheduler for Kubernetes clusters and is part of the cluster control plane. If you really want or have needs in this regard, kube-scheduler is designed to allow you to write a scheduling component yourself and replace the original kube-scheduler.

  • Factors that need to be considered when making scheduling decisions include: individual and overall resource requests, hardware/software/policy restrictions, affinity and anti-affinity requirements (use more), data locality, interference between loads, and so on.
    The default strategy can refer to the
    scheduling framework

2. Factors affecting kubernetes scheduling

2.1 nodeName

  • nodeName is the simplest method for node selection constraints, but it is generally not recommended. If nodeName is specified in PodSpec, it takes precedence over other node selection methods.

  • Use nodeName to select some restrictions on the node: (will report an error)
    if the specified node does not exist.
    If the specified node does not have the resources to accommodate the pod, the pod scheduling fails.
    The names of nodes in a cloud environment are not always predictable or stable.

Example

[root@server2 ~]# vim pod1.yml 
[root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
  nodeName: server3   ##指定server3


[root@server2 ~]# kubectl apply -f pod1.yml 
pod/nginx created
[root@server2 ~]# kubectl get pod -o wide       ##查看详情 

Insert picture description here
Insert picture description here

2.2 nodeSelector

  • nodeSelector is the simplest recommended form of node selection constraint. (Where is the tag priority scheduling, where will it be next time). If two have tags at the same time, but one has insufficient resources, it will be scheduled to the other host.

  • Add labels to the selected nodes:
    kubectl label nodes server2 disktype=ssd

Example

[root@server2 ~]# kubectl label nodes server4 disktype=ssd   ##添加标签
[root@server2 ~]# kubectl get node --show-labels 

[root@server2 ~]# vim pod1.yml 
[root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  nodeSelector:
    disktype: ssd

Insert picture description here
Insert picture description here
Insert picture description here

2.3 Affinity and Anti-Affinity

If node and pod affinity exist at the same time and conflict, an error will be reported!

2.3.1 node affinity

  • Affinity and anti-affinity. nodeSelector provides a very simple way to constrain pods to nodes with specific labels. The affinity/anti-affinity feature greatly expands the types of constraints you can express.
    You can find that the rules are "soft"/"preferences" rather than hard requirements. Therefore, if the scheduler cannot meet the requirements,
    you can still schedule the pod. You can use the pod label on the node to constrain instead of using the node itself. Tags to allow which pods can or cannot be placed together.

  • Node affinity (acts only during the scheduling period)
    requiredDuringSchedulingIgnoredDuringExecution must meet
    preferredDuringSchedulingIgnoredDuringExecution Propensity to meet

  • IgnoreDuringExecution means that if the label of the Node changes during the Pod operation, causing the affinity policy to be unsatisfied, the current Pod will continue to run.

    Official website

  • nodeaffinity also supports the configuration of a variety of rule matching conditions, such as
    In: label value is in the list,
    NotIn: label value is not in the list,
    Gt: label value is greater than the set value, and Pod affinity is not supported.
    Lt: label value is less than the setting Does not support pod affinity
    Exists: the set label exists
    DoesNotExist: the set label does not exist

Example 1

[root@server2 ~]# kubectl label nodes server3 disktype=ssd  ##添加和server4一样的标签
[root@server2 ~]# kubectl label nodes server4 role=db   ##server4添加role

 [root@server2 ~]# cat pod1.yml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
           nodeSelectorTerms:
           - matchExpressions:
             - key: disktype
               operator: In
               values:
                 - ssd
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: role
            operator: In
            values:
            - db 

[root@server2 ~]# kubectl apply -f pod1.yml 
[root@server2 ~]# kubectl get pod -o wide 

Insert picture description here

2.3.2 Pod affinity


  • Pod affinity and anti-affinity podAffinity mainly solves the problem of PODs and which PODs can be deployed in the same topology domain (topology domains are implemented by host labels, which can be a single host or a cluster and zone composed of multiple hosts. Etc.)
    podAntiAffinity mainly solves the problem that PODs cannot be deployed in the same topology domain as which PODs. They deal with the relationship between POD and POD within the Kubernetes cluster.
    Inter-Pod affinity and anti-affinity may be more useful when used with higher-level collections (such as ReplicaSets, StatefulSets, Deployments, etc.). It is easy to configure a set of workloads that should be in the same defined topology (for example, nodes).
    Inter-Pod affinity and anti-affinity require a lot of processing, which may significantly slow down scheduling in large-scale clusters.

Example: Affinity

[root@server2 ~]# kubectl run demo --image=busyboxplus -it   ##运行一个pod
[root@server2 ~]# kubectl  get pod -o wide    ##查看在哪一个server主机

[root@server2 ~]# vim pod2.yaml 
[root@server2 ~]# cat pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: run
            operator: In
            values:
            - demo
        topologyKey: kubernetes.io/hostname


[root@server2 ~]# kubectl apply -f pod2.yaml 
[root@server2 ~]# kubectl get pod -o wide    ##都在一个节点	
 

Insert picture description here

Example: Anti-Affinity

[root@server2 ~]# vim pod2.yaml 
[root@server2 ~]# cat pod2.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: myapp:v1
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
  affinity:
    podAntiAffinity:    ##反亲和只需要修改
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: run
            operator: In
            values:
            - demo
        topologyKey: kubernetes.io/hostname

[root@server2 ~]# kubectl apply  -f pod2.yaml  
[root@server2 ~]# kubectl get pod -o wide   ##查看是否不再一个节点

Insert picture description here

2.4 Taints

  • NodeAffinity node affinity is a property defined on Pod, which enables Pod to be scheduled to a Node according to our requirements, while Taints is the opposite. It can make Node refuse to run Pod or even expel Pod.

  • Taints (taint) is an attribute of Node. After Taints is set, Kubernetes will not schedule Pod to this Node. So Kubernetes sets an attribute Tolerations for Pod, as long as the Pod can tolerate the Node. Stain, then Kubernetes will ignore the taint on Node and can (not necessarily) schedule the Pod.

  • You can use the command kubectl taint to add a taint to the node:
    $ kubectl taint nodes node1 key=value:NoSchedule //Create
    $ kubectl describe nodes server1 |grep Taints //Query
    $ kubectl taint nodes node1 key:NoSchedule- //Delete
    it ] Possible values: [NoSchedule | PreferNoSchedule | NoExecute]
    NoSchedule: POD will not be scheduled to nodes marked as taints.
    PreferNoSchedule: The soft strategy version of NoSchedule.
    NoExecute: This option means that once Taint takes effect, if the POD running in the node does not correspond to the Tolerate setting, it will be ejected directly.

  • The key, value, and effect defined in tolerations must be kept with the taint set on the node:
    if the operator is Exists, the value can be omitted.
    If operator is Equal, the relationship between key and value must be equal.
    If the operator attribute is not specified, the default value is Equal.
    There are also two special values:
    when no key is specified, all keys and values ​​can be matched with Exists and all taints can be tolerated.
    When no effect is specified, all effects are matched.

Example: a taint and a tolerance label

[root@server2 ~]# kubectl describe nodes server2 | grep Taints   ##master有一个污点
Taints:             node-role.kubernetes.io/master:NoSchedule
[root@server2 ~]# kubectl describe nodes server3 | grep Taints
Taints:             <none>
[root@server2 ~]# kubectl describe nodes server4 | grep Taints


[root@server2 ~]# kubectl taint node server3 key1=v1:NoExecute  ##给server3生成一个污点
[root@server2 ~]# vim pod.yml 
[root@server2 ~]# cat pod.yml    ##设置容忍标签tolerations:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 512Mi
      tolerations:
      - key: "key1"
        operator: "Equal"
        value: "v1"
        effect: "NoExecute"
[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide    ##都在server3上是因为使用的calico网络插件

Insert picture description here
Insert picture description here

Insert picture description here

Example: Two special values

##server3和server4都设置污点
[root@server2 ~]# kubectl describe nodes server3 | grep Taints
Taints:             key1=v1:NoExecute
[root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule
node/server4 tainted
[root@server2 ~]# kubectl describe nodes server4 | grep Taints
Taints:             key2=v2:NoSchedule

[root@server2 ~]# vim pod.yml 
[root@server2 ~]# cat pod.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 200Mi
      tolerations:
      - operator: "Exists"

[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod
[root@server2 ~]# kubectl get pod -o wide   ##查看是否两台主机都可以运行

Insert picture description here
Insert picture description here

Insert picture description here

2.5 Affect pod scheduling instructions

  • There are also instructions that affect Pod scheduling: cordon, drain, and delete. Pods created later will not be scheduled to the node, but the degree of violence is different.

2.5.1 cordon

  • Cordon stop scheduling: the
    least impact, only the node will be adjusted to SchedulingDisabled, the newly created pod will not be scheduled to the node, the original pod of the node will not be affected, and the external service will still be provided normally.
    $ kubectl cordon server3
    $ kubectl get node
    NAME STATUS ROLES AGE VERSION
    server1 Ready 29m v1.17.2
    server2 Ready 12d v1.17.2
    server3 Ready, SchedulingDisabled 9d v1.17.2
    $ kubectl uncordon server3 //Restore
[root@server2 ~]#  kubectl taint node server3 key1=v1:NoExecute-   ##删除污点
[root@server2 ~]# kubectl taint node server4 key2=v2:NoSchedule-   ##删除污点
[root@server2 ~]# kubectl cordon server3    ##停止调度server3
[root@server2 ~]# kubectl get node      ##查看一下node的情况

[root@server2 ~]# cat pod.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      run: nginx
  template:
    metadata:
      labels:
        run: nginx
    spec:
      hostNetwork: true
      containers:
      - name: nginx
        image: myapp:v1
        imagePullPolicy: IfNotPresent
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
          limits:
            cpu: 0.5
            memory: 200Mi

[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide    ##全部运行在了server4上
NAME                   READY   STATUS    RESTARTS   AGE   IP            NODE      NOMINATED NODE   READINESS GATES
nginx-b49457b9-7h2q5   1/1     Running   0          3s    172.25.13.4   server4   <none>           <none>
nginx-b49457b9-b6bbl   1/1     Running   0          3s    172.25.13.4   server4   <none>           <none>

[root@server2 ~]# kubectl uncordon server3   ##解除停止调度
[root@server2 ~]# kubectl uncordon server4

Insert picture description here
Insert picture description here
Insert picture description here

2.5.2 drain

  • Drain expel the node:
    expel the pod on the node first, recreate it on other nodes, and then set the node to SchedulingDisabled.
    $ kubectl drain server3 ##Expulsion node
    node/server3 cordoned
    evicting pod "web-1"
    evicting pod "coredns-9d85f5447-mgg2k"
    pod/coredns-9d85f5447-mgg2k evicted
    pod/web-1 evicted
    node/server3 evicted
    $ kubectl uncordon server3 ##Remove
[root@server2 ~]# kubectl drain server4 --ignore-daemonsets   ##
[root@server2 ~]# kubectl get node 
server4   Ready,SchedulingDisabled   <none>                 10d   v1.20.2
[root@server2 ~]# kubectl apply -f pod.yml 
[root@server2 ~]# kubectl get pod -o wide 


[root@server2 ~]# kubectl uncordon server4     ##删除

Insert picture description here
Insert picture description here

2.5.3 delete

  • delete Delete node The
    most violent one, first expel the pod on the node, recreate it on other nodes, then delete the node from the master node, the master loses its control, if you want to resume scheduling, you need to enter the node node and restart the kubelet service
    $ kubectl delete node server3
    $ systemctl restart kubelet //Based on the self-registration function of node, resume use
[root@server2 ~]# kubectl delete nodes server3     ##删除节点server3
[root@server2 ~]# kubectl get nodes 
NAME      STATUS   ROLES                  AGE   VERSION
server2   Ready    control-plane,master   10d   v1.20.2
server4   Ready    <none>                 10d   v1.20.2

[root@server3 ~]# systemctl restart kubelet.service   ##被删除的节点重新启动kubelet服务

[root@server2 ~]# kubectl get node    ##server3节点恢复
NAME      STATUS   ROLES                  AGE   VERSION
server2   Ready    control-plane,master   10d   v1.20.2
server3   Ready    <none>                 11s   v1.20.2
server4   Ready    <none>                 10d   v1.20.2

Insert picture description here

Insert picture description here

Insert picture description here

Guess you like

Origin blog.csdn.net/qwerty1372431588/article/details/114276362