Kubernetes development [5] - scheduling framework to build custom scheduling plug-ins

Table of contents

environmental information

background

foreword

Main frame

The plugin gets clientSet/informer

Set plugin scheduling parameters

Summarize


environmental information

kubernetes:1.22

hundred:7.9

apiVersion: kubescheduler.config.k8s.io/v1beta2 (will be deprecated in 1.25)

background

For creating a pod, we know the following process

At different stages of scheduling/binding, we can inject our custom plug-ins to intervene in the scheduling process.

The scheduling framework Scheduling Framework is just a tool to realize this function.

Since the scheduler involves the scheduling algorithm, this article does not delve into this aspect, but only records the construction and simple use of the Scheduling Framework

foreword

For the basic description of different plugins, it is taken from the official documentation

PreFilter

These plugins are used to preprocess information about Pods, or to check certain conditions that must be met by the cluster or Pods. If the PreFilter plugin returns an error, the dispatch cycle will be terminated.

Filter

These plugins are used to filter out nodes that cannot run the pod. For each node, the scheduler will invoke these filter plugins in the order they are configured. If any filter plugin marks a node as infeasible, the remaining filter plugins are not called for that node. Nodes can be evaluated concurrently.

PostFilter

These plugins are invoked after the Filter phase, but only when there are no viable nodes for that Pod. Plugins are invoked in the order they are configured. If any PostFilter plugin marks a node as "Schedulable", the rest of the plugins will not be called. A typical PostFilter implementation is preemptive, trying to make the Pod schedulable by preempting resources from other Pods.

PreScore

These plugins are used to perform "pre-scoring" work, that is, to generate a sharable state for use by the Score plugin. If the PreScore plugin returns an error, the scheduling cycle will be terminated.

Score

These plugins are used to sort the nodes that pass the filter stage. The scheduler will call each scoring plugin for each node. There will be a well-defined range of integers representing the minimum and maximum scores. After the normalized scoring phase, the scheduler will combine the node scores of all plugins according to the configured plugin weights.

Main frame

First refer to the official sample code

GitHub - kubernetes-sigs/scheduler-plugins: Repository for out-of-tree scheduler plugins based on scheduler framework.

We build main.go

package main

import (
	"fmt"
	"k8s.io/component-base/logs"
	"k8s.io/kubernetes/cmd/kube-scheduler/app"
	"myscheduler/lib"
	"os"
)

func main() {
	//来自/blob/master/cmd/scheduler/main.go
	command := app.NewSchedulerCommand(
        //可变参数——需要注入的插件列表
		app.WithPlugin(lib.TestSchedulingName, lib.NewTestScheduling),
	)
	logs.InitLogs()
	defer logs.FlushLogs()

	if err := command.Execute(); err != nil {
		_, _ = fmt.Fprintf(os.Stderr, "%v\n", err)
		os.Exit(1)
	}
}

Among them, we need to define the name and related implementation of the plug-in, so the following code framework is introduced

//出自/pkg/capacityscheduling 只留了主体框架,简化了大部分

const TestSchedulingName = "test-scheduling" //记住这个调度器名称

type TestScheduling struct {}

func (*TestScheduling) Name() string { //实现framework.Plugin的接口方法
	return TestSchedulingName
}

func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
	return &TestScheduling{}, nil
}

plugin method

The plug-ins of different stages are actually different interfaces in the framework package. If we want to inject the plug-ins of the corresponding stage, we must implement the corresponding interface method. Here, refer to the official method to quickly generate the interface method of preFilter through goland.

var _ framework.PreFilterPlugin = &TestScheduling{}

Generated two interface methods

//业务方法
func PreFilter(ctx context.Context, state *framework.CycleState, p *v1.Pod) *framework.Status
//这个方法是在生成pod或删除pod时产生一些需要评估的内容,返回值同样是个接口,返回自身并快速生成接口方法即可
func PreFilterExtensions() framework.PreFilterExtensions

It is enough to realize the intervention we want to carry out in it.

register scheduler

The steps of compiling the code and packaging the image are omitted~

Because the scheduler pod needs to access the apiserver, you need to specify serviceaccount and bind permissions

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: test-scheduling-clusterrole
rules:
  - apiGroups:
      - ""
    resources:
      - endpoints
      - events
    verbs:
      - create
      - get
      - update
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - delete
      - get
      - list
      - watch
      - update
  - apiGroups:
      - ""
    resources:
      - bindings
      - pods/binding
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - pods/status
    verbs:
      - patch
      - update
  - apiGroups:
      - ""
    resources:
      - replicationcontrollers
      - services
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apps
      - extensions
    resources:
      - replicasets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - apps
    resources:
      - statefulsets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - policy
    resources:
      - poddisruptionbudgets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - persistentvolumeclaims
      - persistentvolumes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - namespaces
      - configmaps
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "storage.k8s.io"
    resources: ['*']
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - "coordination.k8s.io"
    resources:
      - leases
    verbs:
      - create
      - get
      - list
      - update
  - apiGroups:
      - "events.k8s.io"
    resources:
      - events
    verbs:
      - create
      - patch
      - update

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: test-scheduling-sa
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: test-scheduling-clusterrolebinding
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: test-scheduling-clusterrole
subjects:
  - kind: ServiceAccount
    name: test-scheduling-sa
    namespace: kube-system

When the scheduler starts, it needs to refer to the corresponding configuration file to register the plug-in type, set the parameters, etc. We mount the configuration into the container through configMap

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-scheduling-config
  namespace: kube-system
data:
   config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
      - schedulerName: test-scheduling
        plugins:
          preFilter:
            enabled:
            - name: "test-scheduling"

Next is the definition of the scheduler, run by fixing the node and mounting the executable (for testing only)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: test-scheduling
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: test-scheduling
  template:
    metadata:
      labels:
        app: test-scheduling
    spec:
      nodeName: master-01
      serviceAccount: test-scheduling-sa
      containers:
        - name: tests-cheduling
          image: alpine:3.12
          imagePullPolicy: IfNotPresent
          command: ["/app/test-scheduling"]
          args:
            - --config=/etc/kubernetes/config.yaml
            - --v=3
          volumeMounts:
            - name: config
              mountPath: /etc/kubernetes
            - name: app
              mountPath: /app
      volumes:
        - name: config
          configMap:
            name: test-scheduling-config
        - name: app
          hostPath:
             path: /root/schedular

 Check the status of the scheduler. After setting it to Running, you can specify the scheduler in the schedulerName item in the Pod configuration when creating the load.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: testngx
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: testngx
  template:
    metadata:
      labels:
        app: testngx
    spec:
      schedulerName: test-scheduling
      containers:
        - image: nginx:1.18-alpine
          imagePullPolicy: IfNotPresent
          name: testngx
          ports:
            - containerPort: 80

Observing the log of the scheduler, it can be found that the intervention was successful

kubectl logs test-scheduling-54fd7c585f-gmbb6 -n kube-system -f

I1117 08:51:46.567953       1 eventhandlers.go:123] "Add event for unscheduled pod" pod="default/testngx-7cd55446f7-4cmgv"

I1117 08:51:46.568030       1 scheduler.go:516] "Attempting to schedule pod" pod="default/testngx-7cd55446f7-4cmgv"

I1117 08:51:46.568094 1 test-scheduling.go:57] pre-filtering

The plugin gets clientSet/informer

Starting from this section, the common practice in several plugins is introduced;

 The first is the acquisition of go-client, example scenario: in the filter plug-in, filter nodes with xx labels.

We observed that in the constructor of the plug-in structure, there is such an input parameter:

f framework.Handle

This Handle type can help us obtain clientSet or informer; here we take obtaining informer as an example.

Then, there are new member variables and constructors

type TestScheduling struct {
	fac  informers.SharedInformerFactory
}
func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
	return &TestScheduling{
		fac:  f.SharedInformerFactory(),
	}, nil //注入informer工厂
}

 Then you can implement this logic in the plug-in interface method

func (s *TestScheduling) Filter(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
	klog.V(3).Infof("过滤节点")
	for k, v := range nodeInfo.Node().Labels {
		if k == "scheduling" && v != "true" {
			return framework.NewStatus(framework.Unschedulable, "设置了不可调度的标签")
		}
	}
	return framework.NewStatus(framework.Success)
}

You also need to enable this plugin in the scheduler configuration file

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-scheduling-config
  namespace: kube-system
data:
   config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
      - schedulerName: test-scheduling
        plugins:
          preFilter:
            enabled:
            - name: "test-scheduling"
          filter:
            enabled:
            - name: "test-scheduling"

Label a node in the cluster accordingly

kubectl label node node-01 scheduling=false

Then create a load and observe that the pod is in the pending state 

testngx-677b6896b-nqsk8   0/1     Pending   0          5s

Check the pod event and observe that the node is filtered out because the unschedulable label is set

Events:

  Type     Reason            Age   From             Message

  ----     ------            ----  ----             -------

  Warning FailedScheduling 28s test-scheduling 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 set unschedulable label.

So far, the sample demonstration of clientset/informer acquisition and scheduling failure in the scheduler is completed.

It is worth mentioning that if the load yaml fixes the node through nodeName, then the scheduler configured by shcedulerName will not affect the pod scheduling, even if the logic in the filter plug-in filters the node to be fixed.

Set plugin scheduling parameters

This section demonstrates that the scheduler can read dynamically by setting the parameters of the configuration file.

An example scenario is that if the number of Pods in the namespace where the load is created exceeds n, scheduling failure will be returned. Since the node filtering stage is not involved, it is best practice to implement it in the prefilter plugin.

The scheduler configuration file has the following configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: test-scheduling-config
  namespace: kube-system
data:
   config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    leaderElection:
      leaderElect: false
    profiles:
      - schedulerName: test-scheduling
        plugins:
          preFilter:
            enabled:
            - name: "test-scheduling"
          filter:
            enabled:
            - name: "test-scheduling"
        pluginConfig:
          - name: test-scheduling
            args:
              maxPods: 5

The maximum number of Pods is set to 5.

Add this parameter as a member variable to the scheduler structure

type TestScheduling struct {
	fac  informers.SharedInformerFactory
	args *Args
}

type Args struct {
	MaxPods int `json:"maxPods,omitempty"`
}

In the constructor, there is such an input parameter:

configuration runtime.Object

This contains our configuration in the configuration file, reverse it into our configuration structure, and assign it in the constructor

func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
	args := &Args{}
	if err := frameworkruntime.DecodeInto(configuration, args); err != nil { //由配置文件注入参数,并通过configuration获取
		return nil, err
	}
	return &TestScheduling{
		fac:  f.SharedInformerFactory(),
		args: args,
	}, nil //注入informer工厂
}

It can be directly obtained in the interface function of the filter plug-in.

func (s *TestScheduling) PreFilter(ctx context.Context, state *framework.CycleState, p *v1.Pod) *framework.Status {
	klog.V(3).Infof("预过滤")
	pods, err := s.fac.Core().V1().Pods().Lister().Pods(p.Namespace).List(labels.Everything())
	if err != nil {
		return framework.NewStatus(framework.Error, err.Error())
	}
	if len(pods) > s.args.MaxPods { 
		return framework.NewStatus(framework.Unschedulable, "pod数量超过了最大限制")
	}
	return framework.NewStatus(framework.Success)
}

The observations for scheduling failures are similar to those in the previous section and are omitted here.

Summarize

 So far, we have completed some simple gameplay in the hard filter plug-in. Later we will demonstrate some basic operations in the soft scoring plug-in prescore/score plug-in, including capturing and storing raw data in the presocre pre-scoring plug-in, if this data is passed to the prescore and score plug-ins, and how to normalize through Normalize It is optimized to make multiple plug-ins score collaboratively, so that the final score falls within a calibrated range.

Guess you like

Origin blog.csdn.net/kingu_crimson/article/details/127917631