Table of contents
The plugin gets clientSet/informer
Set plugin scheduling parameters
environmental information
kubernetes:1.22
hundred:7.9
apiVersion: kubescheduler.config.k8s.io/v1beta2 (will be deprecated in 1.25)
background
For creating a pod, we know the following process
At different stages of scheduling/binding, we can inject our custom plug-ins to intervene in the scheduling process.
The scheduling framework Scheduling Framework is just a tool to realize this function.
Since the scheduler involves the scheduling algorithm, this article does not delve into this aspect, but only records the construction and simple use of the Scheduling Framework
foreword
For the basic description of different plugins, it is taken from the official documentation
PreFilter
These plugins are used to preprocess information about Pods, or to check certain conditions that must be met by the cluster or Pods. If the PreFilter plugin returns an error, the dispatch cycle will be terminated.
Filter
These plugins are used to filter out nodes that cannot run the pod. For each node, the scheduler will invoke these filter plugins in the order they are configured. If any filter plugin marks a node as infeasible, the remaining filter plugins are not called for that node. Nodes can be evaluated concurrently.
PostFilter
These plugins are invoked after the Filter phase, but only when there are no viable nodes for that Pod. Plugins are invoked in the order they are configured. If any PostFilter plugin marks a node as "Schedulable", the rest of the plugins will not be called. A typical PostFilter implementation is preemptive, trying to make the Pod schedulable by preempting resources from other Pods.
PreScore
These plugins are used to perform "pre-scoring" work, that is, to generate a sharable state for use by the Score plugin. If the PreScore plugin returns an error, the scheduling cycle will be terminated.
Score
These plugins are used to sort the nodes that pass the filter stage. The scheduler will call each scoring plugin for each node. There will be a well-defined range of integers representing the minimum and maximum scores. After the normalized scoring phase, the scheduler will combine the node scores of all plugins according to the configured plugin weights.
Main frame
First refer to the official sample code
We build main.go
package main
import (
"fmt"
"k8s.io/component-base/logs"
"k8s.io/kubernetes/cmd/kube-scheduler/app"
"myscheduler/lib"
"os"
)
func main() {
//来自/blob/master/cmd/scheduler/main.go
command := app.NewSchedulerCommand(
//可变参数——需要注入的插件列表
app.WithPlugin(lib.TestSchedulingName, lib.NewTestScheduling),
)
logs.InitLogs()
defer logs.FlushLogs()
if err := command.Execute(); err != nil {
_, _ = fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}
Among them, we need to define the name and related implementation of the plug-in, so the following code framework is introduced
//出自/pkg/capacityscheduling 只留了主体框架,简化了大部分
const TestSchedulingName = "test-scheduling" //记住这个调度器名称
type TestScheduling struct {}
func (*TestScheduling) Name() string { //实现framework.Plugin的接口方法
return TestSchedulingName
}
func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
return &TestScheduling{}, nil
}
plugin method
The plug-ins of different stages are actually different interfaces in the framework package. If we want to inject the plug-ins of the corresponding stage, we must implement the corresponding interface method. Here, refer to the official method to quickly generate the interface method of preFilter through goland.
var _ framework.PreFilterPlugin = &TestScheduling{}
Generated two interface methods
//业务方法
func PreFilter(ctx context.Context, state *framework.CycleState, p *v1.Pod) *framework.Status
//这个方法是在生成pod或删除pod时产生一些需要评估的内容,返回值同样是个接口,返回自身并快速生成接口方法即可
func PreFilterExtensions() framework.PreFilterExtensions
It is enough to realize the intervention we want to carry out in it.
register scheduler
The steps of compiling the code and packaging the image are omitted~
Because the scheduler pod needs to access the apiserver, you need to specify serviceaccount and bind permissions
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: test-scheduling-clusterrole
rules:
- apiGroups:
- ""
resources:
- endpoints
- events
verbs:
- create
- get
- update
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- pods
verbs:
- delete
- get
- list
- watch
- update
- apiGroups:
- ""
resources:
- bindings
- pods/binding
verbs:
- create
- apiGroups:
- ""
resources:
- pods/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- replicationcontrollers
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
- extensions
resources:
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- policy
resources:
- poddisruptionbudgets
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- persistentvolumeclaims
- persistentvolumes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- namespaces
- configmaps
verbs:
- get
- list
- watch
- apiGroups:
- "storage.k8s.io"
resources: ['*']
verbs:
- get
- list
- watch
- apiGroups:
- "coordination.k8s.io"
resources:
- leases
verbs:
- create
- get
- list
- update
- apiGroups:
- "events.k8s.io"
resources:
- events
verbs:
- create
- patch
- update
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-scheduling-sa
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: test-scheduling-clusterrolebinding
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: test-scheduling-clusterrole
subjects:
- kind: ServiceAccount
name: test-scheduling-sa
namespace: kube-system
When the scheduler starts, it needs to refer to the corresponding configuration file to register the plug-in type, set the parameters, etc. We mount the configuration into the container through configMap
apiVersion: v1
kind: ConfigMap
metadata:
name: test-scheduling-config
namespace: kube-system
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
profiles:
- schedulerName: test-scheduling
plugins:
preFilter:
enabled:
- name: "test-scheduling"
Next is the definition of the scheduler, run by fixing the node and mounting the executable (for testing only)
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-scheduling
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: test-scheduling
template:
metadata:
labels:
app: test-scheduling
spec:
nodeName: master-01
serviceAccount: test-scheduling-sa
containers:
- name: tests-cheduling
image: alpine:3.12
imagePullPolicy: IfNotPresent
command: ["/app/test-scheduling"]
args:
- --config=/etc/kubernetes/config.yaml
- --v=3
volumeMounts:
- name: config
mountPath: /etc/kubernetes
- name: app
mountPath: /app
volumes:
- name: config
configMap:
name: test-scheduling-config
- name: app
hostPath:
path: /root/schedular
Check the status of the scheduler. After setting it to Running, you can specify the scheduler in the schedulerName item in the Pod configuration when creating the load.
apiVersion: apps/v1
kind: Deployment
metadata:
name: testngx
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: testngx
template:
metadata:
labels:
app: testngx
spec:
schedulerName: test-scheduling
containers:
- image: nginx:1.18-alpine
imagePullPolicy: IfNotPresent
name: testngx
ports:
- containerPort: 80
Observing the log of the scheduler, it can be found that the intervention was successful
kubectl logs test-scheduling-54fd7c585f-gmbb6 -n kube-system -f
I1117 08:51:46.567953 1 eventhandlers.go:123] "Add event for unscheduled pod" pod="default/testngx-7cd55446f7-4cmgv"
I1117 08:51:46.568030 1 scheduler.go:516] "Attempting to schedule pod" pod="default/testngx-7cd55446f7-4cmgv"
I1117 08:51:46.568094 1 test-scheduling.go:57] pre-filtering
The plugin gets clientSet/informer
Starting from this section, the common practice in several plugins is introduced;
The first is the acquisition of go-client, example scenario: in the filter plug-in, filter nodes with xx labels.
We observed that in the constructor of the plug-in structure, there is such an input parameter:
f framework.Handle
This Handle type can help us obtain clientSet or informer; here we take obtaining informer as an example.
Then, there are new member variables and constructors
type TestScheduling struct {
fac informers.SharedInformerFactory
}
func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
return &TestScheduling{
fac: f.SharedInformerFactory(),
}, nil //注入informer工厂
}
Then you can implement this logic in the plug-in interface method
func (s *TestScheduling) Filter(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
klog.V(3).Infof("过滤节点")
for k, v := range nodeInfo.Node().Labels {
if k == "scheduling" && v != "true" {
return framework.NewStatus(framework.Unschedulable, "设置了不可调度的标签")
}
}
return framework.NewStatus(framework.Success)
}
You also need to enable this plugin in the scheduler configuration file
apiVersion: v1
kind: ConfigMap
metadata:
name: test-scheduling-config
namespace: kube-system
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
profiles:
- schedulerName: test-scheduling
plugins:
preFilter:
enabled:
- name: "test-scheduling"
filter:
enabled:
- name: "test-scheduling"
Label a node in the cluster accordingly
kubectl label node node-01 scheduling=false
Then create a load and observe that the pod is in the pending state
testngx-677b6896b-nqsk8 0/1 Pending 0 5s
Check the pod event and observe that the node is filtered out because the unschedulable label is set
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 28s test-scheduling 0/3 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 set unschedulable label.
So far, the sample demonstration of clientset/informer acquisition and scheduling failure in the scheduler is completed.
It is worth mentioning that if the load yaml fixes the node through nodeName, then the scheduler configured by shcedulerName will not affect the pod scheduling, even if the logic in the filter plug-in filters the node to be fixed.
Set plugin scheduling parameters
This section demonstrates that the scheduler can read dynamically by setting the parameters of the configuration file.
An example scenario is that if the number of Pods in the namespace where the load is created exceeds n, scheduling failure will be returned. Since the node filtering stage is not involved, it is best practice to implement it in the prefilter plugin.
The scheduler configuration file has the following configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: test-scheduling-config
namespace: kube-system
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
profiles:
- schedulerName: test-scheduling
plugins:
preFilter:
enabled:
- name: "test-scheduling"
filter:
enabled:
- name: "test-scheduling"
pluginConfig:
- name: test-scheduling
args:
maxPods: 5
The maximum number of Pods is set to 5.
Add this parameter as a member variable to the scheduler structure
type TestScheduling struct {
fac informers.SharedInformerFactory
args *Args
}
type Args struct {
MaxPods int `json:"maxPods,omitempty"`
}
In the constructor, there is such an input parameter:
configuration runtime.Object
This contains our configuration in the configuration file, reverse it into our configuration structure, and assign it in the constructor
func NewTestScheduling(configuration runtime.Object, f framework.Handle) (framework.Plugin, error) {
args := &Args{}
if err := frameworkruntime.DecodeInto(configuration, args); err != nil { //由配置文件注入参数,并通过configuration获取
return nil, err
}
return &TestScheduling{
fac: f.SharedInformerFactory(),
args: args,
}, nil //注入informer工厂
}
It can be directly obtained in the interface function of the filter plug-in.
func (s *TestScheduling) PreFilter(ctx context.Context, state *framework.CycleState, p *v1.Pod) *framework.Status {
klog.V(3).Infof("预过滤")
pods, err := s.fac.Core().V1().Pods().Lister().Pods(p.Namespace).List(labels.Everything())
if err != nil {
return framework.NewStatus(framework.Error, err.Error())
}
if len(pods) > s.args.MaxPods {
return framework.NewStatus(framework.Unschedulable, "pod数量超过了最大限制")
}
return framework.NewStatus(framework.Success)
}
The observations for scheduling failures are similar to those in the previous section and are omitted here.
Summarize
So far, we have completed some simple gameplay in the hard filter plug-in. Later we will demonstrate some basic operations in the soft scoring plug-in prescore/score plug-in, including capturing and storing raw data in the presocre pre-scoring plug-in, if this data is passed to the prescore and score plug-ins, and how to normalize through Normalize It is optimized to make multiple plug-ins score collaboratively, so that the final score falls within a calibrated range.