In-depth analysis of cloud native how to use Prometheus to extend the Kubernetes scheduler

1. kubernetes scheduling configuration

① Scheduler Configuration

  • kube-scheduler provides resources for configuration files, as a configuration file for kube-scheduler, and the file is specified by --config= at startup. The KubeSchedulerConfiguration currently used in each kubernetes version is:
    • Versions before 1.21 use v1beta1;
    • Version 1.22 uses v1beta2, but retains v1beta1;
    • Versions 1.23, 1.24, and 1.25 use v1beta3, but keep v1beta2 and delete v1beta1;
  • As shown below, it is a simple example of kubeSchedulerConfiguration, where kubeconfig has the same function as the startup parameter --kubeconfig, and kubeSchedulerConfiguration is similar to configuration files of other components, such as kubeletConfiguration is a configuration file started as a service:
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /etc/srv/kubernetes/kube-scheduler/kubeconfig
  • –kubeconfig and --config cannot be specified at the same time. If --config is specified, other parameters will naturally fail.

② kubeSchedulerConfiguration 使用

  • Through the configuration file, users can customize multiple schedulers and configure the extension points of each stage, and the plug-in provides scheduling behavior in the entire scheduling context through these extension points.
  • The configuration shown below is an example for configuring the extension point (if name="*", all plugins corresponding to the extension point will be disabled/enabled):
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
profiles:
  - plugins:
      score:
        disabled:
        - name: PodTopologySpread
        enabled:
        - name: MyCustomPluginA
          weight: 2
        - name: MyCustomPluginB
          weight: 1
  • Since kubernetes provides multiple schedulers, it naturally supports multiple configuration files for configuration files. Profile is also in the form of a list. You only need to specify multiple configuration lists. The following is an example of multiple configuration files. If there are multiple extension points , and multiple extension points can also be configured for each scheduler:
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
   plugins:
      preScore:
        disabled:
        - name: '*'
      score:
        disabled:
        - name: '*'
  - schedulerName: no-scoring-scheduler
    plugins:
      preScore:
        disabled:
        - name: '*'
      score:
        disabled:
        - name: '*'

③ scheduler scheduling plug-in

  • kube-scheduler provides many plug-ins as scheduling methods by default, and these plug-ins will be enabled if they are not configured by default, such as:
    • ImageLocality: Scheduling will be more biased towards Nodes with container images, extension point: score;
    • TaintToleration: realize the function of taint and tolerance, extension points: filter, preScore, score;
    • NodeName: implement the simplest scheduling method NodeName in the scheduling strategy, extension point: filter;
    • NodePorts: Scheduling will check whether the Node port is occupied, extension points: preFilter, filter;
    • NodeAffinity: Provide node affinity related functions, extension points: filter, score;
    • PodTopologySpread: realize the function of Pod topology domain, extension points: preFilter, filter, preScore, score;
    • NodeResourcesFit: This plugin will check whether the node has all the resources requested by the Pod, using one of the following three strategies: LeastAllocated (default) MostAllocated and RequestedToCapacityRatio, extension points: preFilter, filter, score;
    • VolumeBinding: Check whether the node has or can bind the requested volume, extension points: preFilter, filter, reserve, preBind, score;
    • VolumeRestrictions: Check whether the volume installed in the node meets the volume provider-specific restrictions, extension point: filter;
    • VolumeZone: Checks whether requested volumes meet any zone requirements they may have, extension point: filter;
    • InterPodAffinity: realize the function of affinity and anti-affinity between Pods, extension points: preFilter, filter, preScore, score;
    • PrioritySort: Provides sorting based on default priority, extension point: queueSort.

2. How to extend kube-scheduler?

  • When you think about writing a scheduler for the first time, you usually think that extending kube-scheduler is a very difficult thing. In fact, the official kubernetes has already thought of these things. For this reason, kubernetes introduced the concept of framework in version 1.15. The framework aims to In making the scheduler more extensible.
  • The framework uses it as plugins by redefining each extension point, and supports users to register out of tree extensions so that they can be registered in kube-scheduler.

① Define entry

  • The scheduler allows customization, but it only needs to refer to the corresponding NewSchedulerCommand and implement the logic of plugins:
import (
    scheduler "k8s.io/kubernetes/cmd/kube-scheduler/app"
)

func main() {
    
    
    command := scheduler.NewSchedulerCommand(
            scheduler.WithPlugin("example-plugin1", ExamplePlugin1),
            scheduler.WithPlugin("example-plugin2", ExamplePlugin2))
    if err := command.Execute(); err != nil {
    
    
        fmt.Fprintf(os.Stderr, "%v\n", err)
        os.Exit(1)
    }
}
  • The NewSchedulerCommand allows injection of out of tree plugins, that is, injection of external custom plugins. In this case, there is no need to modify the source code to define a scheduler, but to complete a custom scheduler only by implementing it yourself:
// WithPlugin 用于注入out of tree plugins 因此scheduler代码中没有其引用。
func WithPlugin(name string, factory runtime.PluginFactory) Option {
    
    
 return func(registry runtime.Registry) error {
    
    
  return registry.Register(name, factory)
 }
}

② Plug-in implementation

  • The implementation of the plug-in only needs to implement the corresponding extension point interface. The built-in plug-in NodeAffinity can be found by observing its structure. The implementation of the plug-in is to implement the corresponding extension point abstract interface:

insert image description here

  • Define the plug-in structure: framework.FrameworkHandle is used for calling between Kubernetes API and scheduler. It can be seen from the structure that it includes lister, informer, etc. This parameter must also be implemented:
type NodeAffinity struct {
    
    
 handle framework.FrameworkHandle
}
  • Implement the corresponding extension point:
func (pl *NodeAffinity) Score(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeName string) (int64, *framework.Status) {
    
    
 nodeInfo, err := pl.handle.SnapshotSharedLister().NodeInfos().Get(nodeName)
 if err != nil {
    
    
  return 0, framework.NewStatus(framework.Error, fmt.Sprintf("getting node %q from Snapshot: %v", nodeName, err))
 }

 node := nodeInfo.Node()
 if node == nil {
    
    
  return 0, framework.NewStatus(framework.Error, fmt.Sprintf("getting node %q from Snapshot: %v", nodeName, err))
 }

 affinity := pod.Spec.Affinity

 var count int64
 // A nil element of PreferredDuringSchedulingIgnoredDuringExecution matches no objects.
 // An element of PreferredDuringSchedulingIgnoredDuringExecution that refers to an
 // empty PreferredSchedulingTerm matches all objects.
 if affinity != nil && affinity.NodeAffinity != nil && affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution != nil {
    
    
  // Match PreferredDuringSchedulingIgnoredDuringExecution term by term.
  for i := range affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution {
    
    
   preferredSchedulingTerm := &affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution[i]
   if preferredSchedulingTerm.Weight == 0 {
    
    
    continue
   }

   // TODO: Avoid computing it for all nodes if this becomes a performance problem.
   nodeSelector, err := v1helper.NodeSelectorRequirementsAsSelector(preferredSchedulingTerm.Preference.MatchExpressions)
   if err != nil {
    
    
    return 0, framework.NewStatus(framework.Error, err.Error())
   }

   if nodeSelector.Matches(labels.Set(node.Labels)) {
    
    
    count += int64(preferredSchedulingTerm.Weight)
   }
  }
 }

 return count, nil
}
  • Finally, provide a method for registering this extension by implementing a New function, which can be injected into the scheduler as out of tree plugins in main.go:
// New initializes a new plugin and returns it.
func New(_ runtime.Object, h framework.FrameworkHandle) (framework.Plugin, error) {
    
    
 return &NodeAffinity{
    
    handle: h}, nil
}

3. Scheduling based on network traffic

  • Through the above understanding of how to extend the scheduler plug-in, an example of traffic-based scheduling will be completed below. Usually, the network traffic used by a Node in the network for a period of time is also a very common situation in the production environment.
  • For example, among multiple hosts with a balanced configuration, host A runs as a service order script, and host B runs as a normal service. Because ordering requires downloading a large amount of data, but the hardware resources are rarely occupied, at this time, if a Pod is If it is scheduled to this node, the business of both parties may be affected (the front-end agent thinks that the node has a small number of connections and will be scheduled in large numbers, and the order script will reduce the efficiency due to the occupation of network bandwidth).

① Environment configuration

  • A kubernetes cluster must have at least two nodes.
  • The provided kubernetes clusters all need to install prometheus node_exporter, which can be inside or outside the cluster, and the one outside the cluster is used here.
  • Have an understanding of promQL and client_golang .
  • The example is roughly divided into the following steps:
    • Define the plug-in API, and the plug-in is named NetworkTraffic;
    • Define the extension point, the Score extension point is used here, and the scoring algorithm is defined;
    • Define the way to obtain the score (get the corresponding data from the prometheus indicator);
    • Define the parameter input to the custom scheduler;
    • Deploy the project to the cluster (in-cluster deployment and out-of-cluster deployment);
    • Example of result verification.
  • The example will follow the built-in plug-in nodeaffinity to complete the code writing. The reason for choosing this plug-in is that this plug-in is relatively simple and has basically the same purpose as the need. In fact, other plug-ins have the same effect.

② Error handling

  • When initializing the project, go mod tidy and other operations, you will encounter a lot of the following errors:
go: github.com/GoogleCloudPlatform/spark-on-k8s-operator@v0.0.0-20210307184338-1947244ce5f4 requires
        k8s.io/apiextensions-apiserver@v0.0.0: reading k8s.io/apiextensions-apiserver/go.mod at revision v0.0.0: unknown revision v0.0.0
  • This problem was mentioned in kubernetes issue #79384. After a cursory glance, it did not explain why this problem occurred. At the bottom, a boss provided a script. When the above problem cannot be solved, run the script directly and it will be normal:
#!/bin/sh
set -euo pipefail

VERSION=${
    
    1#"v"}
if [ -z "$VERSION" ]; then
    echo "Must specify version!"
    exit 1
fi
MODS=($(
    curl -sS https://raw.githubusercontent.com/kubernetes/kubernetes/v${
    
    VERSION}/go.mod |
    sed -n 's|.*k8s.io/\(.*\) => ./staging/src/k8s.io/.*|k8s.io/\1|p'
))
for MOD in "${MODS[@]}"; do
    V=$(
        go mod download -json "${MOD}@kubernetes-${VERSION}" |
        sed -n 's|.*"Version": "\(.*\)".*|\1|p'
    )
    go mod edit "-replace=${MOD}=${MOD}@${V}"
done
go get "k8s.io/kubernetes@v${VERSION}"

③ Define plug-in API

  • Through the above content description, we understand that defining a plug-in only needs to implement the corresponding extension point abstract interface, then the project directory pkg/networtraffic/networktraffice.go can be initialized.
  • Define the plugin name and variables:
const Name = "NetworkTraffic"
var _ = framework.ScorePlugin(&NetworkTraffic{
    
    })
  • Define the structure of the plugin:
type NetworkTraffic struct {
    
    
 // 这个作为后面获取node网络流量使用
 prometheus *PrometheusHandle
 // FrameworkHandle 提供插件可以使用的数据和一些工具
 // 它在插件初始化时传递给 plugin 工厂类
 // plugin 必须存储和使用这个handle来调用framework函数
 handle framework.FrameworkHandle
}

④ Define extension points

  • Because the Score extension point is selected, the corresponding method needs to be defined to realize the corresponding abstraction:
func (n *NetworkTraffic) Score(ctx context.Context, state *framework.CycleState, p *corev1.Pod, nodeName string) (int64, *framework.Status) {
    
    
    // 通过promethes拿到一段时间的node的网络使用情况
 nodeBandwidth, err := n.prometheus.GetGauge(nodeName)
 if err != nil {
    
    
  return 0, framework.NewStatus(framework.Error, fmt.Sprintf("error getting node bandwidth measure: %s", err))
 }
 bandWidth := int64(nodeBandwidth.Value)
 klog.Infof("[NetworkTraffic] node '%s' bandwidth: %s", nodeName, bandWidth)
 return bandWidth, nil // 这里直接返回就行
}
  • Next, the results need to be normalized. It can be seen from the source code that the Score extension point needs to implement more than just this single method:
// Run NormalizeScore method for each ScorePlugin in parallel.
parallelize.Until(ctx, len(f.scorePlugins), func(index int) {
    
    
    pl := f.scorePlugins[index]
    nodeScoreList := pluginToNodeScores[pl.Name()]
    if pl.ScoreExtensions() == nil {
    
    
        return
    }
    status := f.runScoreExtension(ctx, pl, state, pod, nodeScoreList)
    if !status.IsSuccess() {
    
    
        err := fmt.Errorf("normalize score plugin %q failed with error %v", pl.Name(), status.Message())
        errCh.SendErrorWithCancel(err, cancel)
        return
    }
})
  • From the above code, you can understand that to implement Score, you must implement ScoreExtensions, and return directly if it is not implemented. According to the example in nodeaffinity, it is found that this method only returns the extension point object itself, and the specific normalization is the actual scoring operation in NormalizeScore.
// NormalizeScore invoked after scoring all nodes.
func (pl *NodeAffinity) NormalizeScore(ctx context.Context, state *framework.CycleState, pod *v1.Pod, scores framework.NodeScoreList) *framework.Status {
    
    
 return pluginhelper.DefaultNormalizeScore(framework.MaxNodeScore, false, scores)
}

// ScoreExtensions of the Score plugin.
func (pl *NodeAffinity) ScoreExtensions() framework.ScoreExtensions {
    
    
 return pl
}
  • In the scheduling framework, the method of actually performing the operation is also NormalizeScore():
func (f *frameworkImpl) runScoreExtension(ctx context.Context, pl framework.ScorePlugin, state *framework.CycleState, pod *v1.Pod, nodeScoreList framework.NodeScoreList) *framework.Status {
    
    
 if !state.ShouldRecordPluginMetrics() {
    
    
  return pl.ScoreExtensions().NormalizeScore(ctx, state, pod, nodeScoreList)
 }
 startTime := time.Now()
 status := pl.ScoreExtensions().NormalizeScore(ctx, state, pod, nodeScoreList)
 f.metricsRecorder.observePluginDurationAsync(scoreExtensionNormalize, pl.Name(), status, metrics.SinceInSeconds(startTime))
 return status
}
  • In NormalizeScore, a specific algorithm for selecting nodes needs to be implemented. The implemented algorithm formula will be the highest score, the highest current bandwidth, and the highest bandwidth. This ensures that the machine with a larger bandwidth occupation has a lower score. For example, if the highest bandwidth is 200,000, and the current Node bandwidth is 140,000, then the Node score is:
// 如果返回framework.ScoreExtensions 就需要实现framework.ScoreExtensions
func (n *NetworkTraffic) ScoreExtensions() framework.ScoreExtensions {
    
    
 return n
}

// NormalizeScore与ScoreExtensions是固定格式
func (n *NetworkTraffic) NormalizeScore(ctx context.Context, state *framework.CycleState, pod *corev1.Pod, scores framework.NodeScoreList) *framework.Status {
    
    
 var higherScore int64
 for _, node := range scores {
    
    
  if higherScore < node.Score {
    
    
   higherScore = node.Score
  }
 }
 // 计算公式为,满分 - (当前带宽 / 最高最高带宽 * 100)
 // 公式的计算结果为,带宽占用越大的机器,分数越低
 for i, node := range scores {
    
    
  scores[i].Score = framework.MaxNodeScore - (node.Score * 100 / higherScore)
  klog.Infof("[NetworkTraffic] Nodes final score: %v", scores)
 }

 klog.Infof("[NetworkTraffic] Nodes final score: %v", scores)
 return nil
}
  • In kubernetes, the maximum number of nodes supports 5000. Doesn't it mean that looping takes up a lot of performance when obtaining the maximum score? In fact, there is no need to worry. The scheduler provides a parameter percentageOfNodesToScore, which determines the number of deployment cycles.

⑤ Configure the plug-in name

  • In order for the plugin to be used when registering, it also needs to be configured with a name:
// Name returns name of the plugin. It is used in logs, etc.
func (n *NetworkTraffic) Name() string {
    
    
 return Name
}

⑥ Define the parameters to be passed in

  • There is also a prometheusHandle in the extension of the network plug-in, which is the action of operating the prometheus-server to get the indicators. First, you need to define a PrometheusHandle structure:
type PrometheusHandle struct {
    
    
 deviceName string // 网络接口名称
 timeRange  time.Duration // 抓取的时间段
 ip         string // prometheus server的连接地址
 client     v1.API // 操作prometheus的客户端
}
  • With the structure, you need to query actions and indicators. For indicators, node_network_receive_bytes_total is used here as the calculation method for obtaining Node network traffic. Since the environment is deployed outside the cluster, there is no node host name, and it is obtained through promQL. The entire statement is as follows:
sum_over_time(node_network_receive_bytes_total{
    
    device="eth0"}[1s]) * on(instance) group_left(nodename) (node_uname_info{
    
    nodename="node01"})
整个 Prometheus 部分如下:

type PrometheusHandle struct {
    
    
 deviceName string
 timeRange  time.Duration
 ip         string
 client     v1.API
}

func NewProme(ip, deviceName string, timeRace time.Duration) *PrometheusHandle {
    
    
 client, err := api.NewClient(api.Config{
    
    Address: ip})
 if err != nil {
    
    
  klog.Fatalf("[NetworkTraffic] FatalError creating prometheus client: %s", err.Error())
 }
 return &PrometheusHandle{
    
    
  deviceName: deviceName,
  ip:         ip,
  timeRange:  timeRace,
  client:     v1.NewAPI(client),
 }
}

func (p *PrometheusHandle) GetGauge(node string) (*model.Sample, error) {
    
    
 value, err := p.query(fmt.Sprintf(nodeMeasureQueryTemplate, node, p.deviceName, p.timeRange))
 fmt.Println(fmt.Sprintf(nodeMeasureQueryTemplate, p.deviceName, p.timeRange, node))
 if err != nil {
    
    
  return nil, fmt.Errorf("[NetworkTraffic] Error querying prometheus: %w", err)
 }

 nodeMeasure := value.(model.Vector)
 if len(nodeMeasure) != 1 {
    
    
  return nil, fmt.Errorf("[NetworkTraffic] Invalid response, expected 1 value, got %d", len(nodeMeasure))
 }
 return nodeMeasure[0], nil
}

func (p *PrometheusHandle) query(promQL string) (model.Value, error) {
    
    
    // 通过promQL查询并返回结果
 results, warnings, err := p.client.Query(context.Background(), promQL, time.Now())
 if len(warnings) > 0 {
    
    
  klog.Warningf("[NetworkTraffic Plugin] Warnings: %v\n", warnings)
 }

 return results, err
}

⑦ Configure the parameters of the scheduler

  • Because the address of prometheus, the name of the network card and the size of the obtained data need to be specified, the entire structure is as follows. In addition, the parameter structure must follow the name in the Args format:
type NetworkTrafficArgs struct {
    
    
 IP         string `json:"ip"`
 DeviceName string `json:"deviceName"`
 TimeRange  int    `json:"timeRange"`
}
  • In order to make this type of data a structure that can be parsed by KubeSchedulerConfiguration, one more step is required, which is to expand the corresponding resource type when extending APIServer. Here, kubernetes provides two methods to extend the resource type of KubeSchedulerConfiguration:
    • One is that the framework.DecodeInto function provided in the old version can do this operation:
func New(plArgs *runtime.Unknown, handle framework.FrameworkHandle) (framework.Plugin, error) {
    
    
 args := Args{
    
    }
 if err := framework.DecodeInto(plArgs, &args); err != nil {
    
    
  return nil, err
 }
 ...
}
    • Another way is to implement the corresponding deep copy method, such as in NodeLabel:
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object

// NodeLabelArgs holds arguments used to configure the NodeLabel plugin.
type NodeLabelArgs struct {
    
    
 metav1.TypeMeta

 // PresentLabels should be present for the node to be considered a fit for hosting the pod
 PresentLabels []string
 // AbsentLabels should be absent for the node to be considered a fit for hosting the pod
 AbsentLabels []string
 // Nodes that have labels in the list will get a higher score.
 PresentLabelsPreference []string
 // Nodes that don't have labels in the list will get a higher score.
 AbsentLabelsPreference []string
}
  • Finally, register it in register, the whole behavior is similar to extending APIServer:
// addKnownTypes registers known types to the given scheme
func addKnownTypes(scheme *runtime.Scheme) error {
    
    
 scheme.AddKnownTypes(SchemeGroupVersion,
  &KubeSchedulerConfiguration{
    
    },
  &Policy{
    
    },
  &InterPodAffinityArgs{
    
    },
  &NodeLabelArgs{
    
    },
  &NodeResourcesFitArgs{
    
    },
  &PodTopologySpreadArgs{
    
    },
  &RequestedToCapacityRatioArgs{
    
    },
  &ServiceAffinityArgs{
    
    },
  &VolumeBindingArgs{
    
    },
  &NodeResourcesLeastAllocatedArgs{
    
    },
  &NodeResourcesMostAllocatedArgs{
    
    },
 )
 scheme.AddKnownTypes(schema.GroupVersion{
    
    Group: "", Version: runtime.APIVersionInternal}, &Policy{
    
    })
 return nil
}
  • For generating deep copy functions and other files, you can use the script kubernetes/hack/update-codegen.sh in the kubernetes code base. For convenience, the framework.DecodeInto method is used here.

⑧ Project deployment

  • Prepare the profile of the scheduler, you can see that the custom parameters can be recognized as the resource type of KubeSchedulerConfiguration:
apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
clientConnection:
  kubeconfig: /mnt/d/src/go_work/customScheduler/scheduler.conf
profiles:
- schedulerName: custom-scheduler
  plugins:
    score:
      enabled:
      - name: "NetworkTraffic"
      disabled:
      - name: "*"
  pluginConfig:
    - name: "NetworkTraffic"
      args:
        ip: "http://10.0.0.4:9090"
        deviceName: "eth0"
        timeRange: 60
  • If it needs to be deployed inside the cluster, it can be packaged into a mirror image:
FROM golang:alpine AS builder
MAINTAINER cylon
WORKDIR /scheduler
COPY ./ /scheduler
ENV GOPROXY https://goproxy.cn,direct
RUN \
    sed -i 's/dl-cdn.alpinelinux.org/mirrors.ustc.edu.cn/g' /etc/apk/repositories && \
    apk add upx  && \
    GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -ldflags "-s -w" -o scheduler main.go && \
    upx -1 scheduler && \
    chmod +x scheduler

FROM alpine AS runner
WORKDIR /go/scheduler
COPY --from=builder /scheduler/scheduler .
COPY --from=builder /scheduler/scheduler.yaml /etc/
VOLUME ["./scheduler"]
  • List of resources required for deployment inside the cluster:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: scheduler-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: scheduler
subjects:
  - kind: ServiceAccount
    name: scheduler-sa
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:kube-scheduler
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-scheduler
  namespace: kube-system
  labels:
    component: custom-scheduler
spec:
  selector:
    matchLabels:
      component: custom-scheduler
  template:
    metadata:
      labels:
        component: custom-scheduler
    spec:
      serviceAccountName: scheduler-sa
      priorityClassName: system-cluster-critical
      containers:
        - name: scheduler
          image: cylonchau/custom-scheduler:v0.0.1
          imagePullPolicy: IfNotPresent
          command:
            - ./scheduler
            - --config=/etc/scheduler.yaml
            - --v=3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10251
            initialDelaySeconds: 15
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10251
  • Start the custom scheduler, which is started in a simple binary mode, so a kubeconfig is required as an authentication file:
$ ./main --logtostderr=true \
 --address=127.0.0.1 \
 --v=3 \
 --config=`pwd`/scheduler.yaml \
 --kubeconfig=`pwd`/scheduler.conf
  • After startup, for the convenience of verification, the original kube-scheduler service is closed, because the original kube-scheduler has been used as the master in HA, so the custom scheduler will not be used to cause pod pending.

⑨ Verification result

  • Prepare a Pod that needs to be deployed, specifying the name of the scheduler to use:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2 
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
      schedulerName: custom-scheduler
  • The experimental environment here is a kubernetes cluster with two nodes, master and node01, because the master has more services than node01. In this case, no matter what, the scheduling result will always be scheduled to node01:
$ kubectl get pods -o wide
NAME                                READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
nginx-deployment-69f76b454c-lpwbl   1/1     Running   0          43s   192.168.0.17   node01   <none>           <none>
nginx-deployment-69f76b454c-vsb7k   1/1     Running   0          43s   192.168.0.16   node01   <none>           <none>
  • The log of the scheduler is as follows:
I0808 01:56:31.098189   27131 networktraffic.go:83] [NetworkTraffic] node 'node01' bandwidth: %!s(int64=12541068340)
I0808 01:56:31.098461   27131 networktraffic.go:70] [NetworkTraffic] Nodes final score: [{
    
    master-machine 0} {
    
    node01 12541068340}]
I0808 01:56:31.098651   27131 networktraffic.go:70] [NetworkTraffic] Nodes final score: [{
    
    master-machine 0} {
    
    node01 71}]
I0808 01:56:31.098911   27131 networktraffic.go:73] [NetworkTraffic] Nodes final score: [{
    
    master-machine 0} {
    
    node01 71}]
I0808 01:56:31.099275   27131 default_binder.go:51] Attempting to bind default/nginx-deployment-69f76b454c-vsb7k to node01
I0808 01:56:31.101414   27131 eventhandlers.go:225] add event for scheduled pod default/nginx-deployment-69f76b454c-lpwbl
I0808 01:56:31.101414   27131 eventhandlers.go:205] delete event for unscheduled pod default/nginx-deployment-69f76b454c-lpwbl
I0808 01:56:31.103604   27131 scheduler.go:609] "Successfully bound pod to node" pod="default/nginx-deployment-69f76b454c-lpwbl" node="no
de01" evaluatedNodes=2 feasibleNodes=2
I0808 01:56:31.104540   27131 scheduler.go:609] "Successfully bound pod to node" pod="default/nginx-deployment-69f76b454c-vsb7k" node="no
de01" evaluatedNodes=2 feasibleNodes=2

Guess you like

Origin blog.csdn.net/Forever_wj/article/details/131287697