Source code analysis of Kubernetes' Device Plugin mechanism

Introduction:  The Device Plugin mechanism introduced by Kubernetes 1.8 can support the integration of various devices such as GPU, FPGA, high-performance NIC, InfiniBand, etc. through expansion. Device Manager is the module responsible for Device Plugin interaction and device lifecycle management in Kubelet. After understanding its basic design, this article analyzes the source code of Device Manager and understands its operation mode.

The Device Plugin mechanism introduced in Kubernetes 1.8 supports the integration of various devices such as GPU, FPGA, high-performance NIC, and InfiniBand in an extended manner. Device Manager is the module responsible for Device Plugin interaction and device lifecycle management in Kubelet. After understanding its basic design , we need to analyze the source code of Device Manager to understand its operation mode.

The basic principle

First clarify the goal:

It is not to understand all the implementations of Kubelet, but to understand how Device Manager works in resource discovery, Pod creation, device health check and how it interacts with Kubelet, so we will ignore operations that have nothing to do with Device Manager.

Here are my principles and some experience when reading the code:

  • Understand the interface and figure out the interaction with external modules
  • Understand the structure that implements the interface
  • Associating method calls and data structures from the perspective of user scenarios is like connecting plots and characters. After understanding the task settings, you can quickly cut into the code call process; and the reading of code calls can also deepen the data structure Design understanding
  • The code of Kubernetes is more complicated, and it is difficult to figure out the purpose and purpose of each data structure definition at a glance. At this time, we can write down the questions and assumptions. Don't get too entangled, you can verify it later. Reading a book can change its meaning, and the code is the same. When you become familiar with the context of the code, some problems will be solved.
  • Since Device Manager works in Kubelet, an understanding of Kubelet's source code is the basis for understanding the operation mechanism of specific modules

PS The Kubernetes source code version analyzed in this article is 1.9.3

DeviceManager core code are  pkg/kubelet/cm/devicepluginunder

devicePlugin.png

DeviceManager interface definition

File

pkg/kubelet/cm/deviceplugin/types.go

Specific definition:

// Manager manages all the Device Plugins running on a node.
type Manager interface {
    // Start starts device plugin registration service.
    Start(activePods ActivePodsFunc, sourcesReady config.SourcesReady) error

    // Devices is the map of devices that have registered themselves
    // against the manager.
    // The map key is the ResourceName of the device plugins.
    Devices() map[string][]pluginapi.Device

    // Allocate configures and assigns devices to pods. The pods are provided
    // through the pod admission attributes in the attrs argument. From the
    // requested device resources, Allocate will communicate with the owning
    // device plugin to allow setup procedures to take place, and for the
    // device plugin to provide runtime settings to use the device (environment
    // variables, mount points and device files). The node object is provided
    // for the device manager to update the node capacity to reflect the
    // currently available devices.
    Allocate(node *schedulercache.NodeInfo, attrs *lifecycle.PodAdmitAttributes) error

    // Stop stops the manager.
    Stop() error

    // GetDeviceRunContainerOptions checks whether we have cached containerDevices
    // for the passed-in <pod, container> and returns its DeviceRunContainerOptions
    // for the found one. An empty struct is returned in case no cached state is found.
    GetDeviceRunContainerOptions(pod *v1.Pod, container *v1.Container) *DeviceRunContainerOptions

    // GetCapacity returns the amount of available device plugin resource capacity
    // and inactive device plugin resources previously registered on the node.
    GetCapacity() (v1.ResourceList, []string)
}

From the comments, you can see that DeviceManager is responsible for managing all device plug-ins running on the node. Here are 6 methods that can interact with the outside world:

  • Start() and stop() are to start device plug-in registration and stop service respectively, which is actually a common routine in K8S
  • Devices() lists the device list in the form of a map

The following 3 methods are the core work:

  • Allocate() allocates available devices for Pod, and calls the device plug-in to perform the required device initialization
  • GetDeviceRunContainerOptions() obtains the parameters needed to configure the device for the container, such as Environment, Volume and Device. This method will be used in the process of creating the container
  • GetCapacity() is used by the node to report the number of Extended Resources to the API Server

Of course, to understand more clearly, you also need to understand the call link in a specific scenario. There are two implementations of the DeviceManager interface here: MangerImpl and  ManagerStub, ManagerStub is actually an empty implementation, so there is no need to look closely. Let's briefly understand  MangerImplthe implementation

DeviceManager interface implementation

File

pkg/kubelet/cm/deviceplugin/manager.go

Specific definition:

// ManagerImpl is the structure in charge of managing Device Plugins.
type ManagerImpl struct {
    socketname string
    socketdir  string

    endpoints map[string]endpoint // Key is ResourceName
    mutex     sync.Mutex

    server *grpc.Server

    // activePods is a method for listing active pods on the node
    // so the amount of pluginResources requested by existing pods
    // could be counted when updating allocated devices
    activePods ActivePodsFunc

    // sourcesReady provides the readiness of kubelet configuration sources such as apiserver update readiness.
    // We use it to determine when we can purge inactive pods from checkpointed state.
    sourcesReady config.SourcesReady

    // callback is used for updating devices' states in one time call.
    // e.g. a new device is advertised, two old devices are deleted and a running device fails.
    callback monitorCallback

    // allDevices contains all of registered resourceNames and their exported device IDs.
    allDevices map[string]sets.String

    // allocatedDevices contains allocated deviceIds, keyed by resourceName.
    allocatedDevices map[string]sets.String

    // podDevices contains pod to allocated device mapping.
    podDevices podDevices
}

In the definition and comments of ManagerImpl, you can roughly guess that it is doing three things:

  • Provide grpc service and support the registration of multiple Device Plugin
  • Provide a callback function for the Device Plugin monitorCallback. When the status of the device changes, the Device Manager can be notified to do some corresponding processing. For example, when a device cannot work normally, it is necessary to subtract one from the total number of resources available on the node
  • The allocation and management of equipment, specifically, is to record the total number of a certain equipment and which number has been allocated. From this point of view, Device Plugin needs to provide a UUID for each device. This UUID needs to be unique and unchangeable on this node. What Device Manager has to do is to maintain the set of UUIDs and be responsible for device update and distribution.

Scene classification

Five scenarios are mainly involved here:

  • Initialization and startup of Device Manager
  • Receive the endpoint registration of the Device Plugin, and query the endpoint for the Device ID list
  • Report device information on the node regularly
  • When creating a Pod, combine the device information with the Pod to generate the configuration (Environment, Device, Volume) needed to create the container
  • When the device status is unhealthy, notify Kubelet to update the status of available devices

This article first analyzes scenario one: the initialization and startup process of Device Manager

Device Manager initialization and startup process

Kubernetes has a huge amount of code, but a closer look at the startup process of each module has a relatively similar routine. Take Kubelet as an example:

  1. Create a  KubeletServer configuration object that holds all the configuration information needed for kubelet operation
  2. Parse the command line and update according to the parameters of the command line KubeletServer
  3. KubeletServer Create real kubelet runtime objects based on  the configuration 
  4. Start()Start the kubelet runtime object through a method 

The initialization of DeviceManger happens in step 3 and step 4.

deviceManagerInit.png

  • app.kubeletCorresponds tocmd/kubelet/kubelet.go
  • serverCorresponds tocmd/kubelet/app/server.go
  • kubeletCorresponds topkg/kubelet/kubelet.go
  • container_manager_linuxCorresponds topkg/kubelet/cm/container_manager_linux.go
  • device.managerCorresponds topkg/kubelet/cm/deviceplugin/manager.go

The above sequence diagram is the process of how Kubelet initializes and starts DeviceManager (for ease of understanding, methods that have nothing to do with DeviceManager will be ignored here)

It can be seen that the serverChinese run()method does two things: NewMainKubeletand startKubelet, and the initialization and startup of the Device Manager are also completed in these two steps, and the grpc registration service is started at the same time, then the Device Plugin can be registered.

  1. DeviceMangerThe initialization is ContainerManagerdone when the object is created , and the ContainerManagerobject is used as a parameter to NewMainKubeletcreate the Kubeletruntime object,

Actually defined in:pkg/kubelet/cm/container_manager_linux.go

func NewContainerManager(mountUtil mount.Interface, cadvisorInterface cadvisor.Interface, nodeConfig NodeConfig, failSwapOn bool, devicePluginEnabled bool, recorder record.EventRecorder) (ContainerManager, error) {
...

glog.Infof("Creating device plugin manager: %t", devicePluginEnabled)
    if devicePluginEnabled {
        cm.devicePluginManager, err = deviceplugin.NewManagerImpl()
    } else {
        cm.devicePluginManager, err = deviceplugin.NewManagerStub()
    }

...
}

Since this feature is still relatively new, it needs to be turned on through the feature gate, that is, configure --feature-gates=DevicePlugins=true, and this feature is turned off by default. It will be called when this function is turned on deviceplugin.NewManagerImpl(), otherwise there will be a stub implementation and nothing will be done.

deviceplugin.NewManagerImpl()Defined pkg/kubelet/cm/deviceplugin/manager.gowithin,

// NewManagerImpl creates a new manager.
func NewManagerImpl() (*ManagerImpl, error) {
    return newManagerImpl(pluginapi.KubeletSocket)
}

In fact, the actual initial work is done in the following methods

func newManagerImpl(socketPath string) (*ManagerImpl, error) {
    glog.V(2).Infof("Creating Device Plugin manager at %s", socketPath)

    if socketPath == "" || !filepath.IsAbs(socketPath) {
        return nil, fmt.Errorf(errBadSocket+" %v", socketPath)
    }

    dir, file := filepath.Split(socketPath)
    manager := &ManagerImpl{
        endpoints:        make(map[string]endpoint),
        socketname:       file,
        socketdir:        dir,
        allDevices:       make(map[string]sets.String),
        allocatedDevices: make(map[string]sets.String),
        podDevices:       make(podDevices),
    }
    manager.callback = manager.genericDeviceUpdateCallback

    // The following structs are populated with real implementations in manager.Start()
    // Before that, initializes them to perform no-op operations.
    manager.activePods = func() []*v1.Pod { return []*v1.Pod{} }
    manager.sourcesReady = &sourcesReadyStub{}

    return manager, nil
}

This is just the initialization of ManagerImpl, there are only two meaningful tasks

  • Set up the listening file of DeviceManager built-in grpc service  socketPath, because DeviceManager and Device Plugin are deployed on the same node, so only need to use Unix Socket mode communication
  • Set the callback function of the device status genericDeviceUpdateCallback

As mentioned The following structs are populated with real implementations in manager.Start()in the comments  , actually in the initialization phase, there is no

  1. DeviceMangerIt is Start()part of the startup of the Kubelet 运行时initializeModules 调用的,具体还是ContainerManager` startup.
func (cm *containerManagerImpl) Start(node *v1.Node,
    activePods ActivePodsFunc,
    sourcesReady config.SourcesReady,
    podStatusProvider status.PodStatusProvider,
    runtimeService internalapi.RuntimeService) error {

...

// Starts device plugin manager.
    if err := cm.devicePluginManager.Start(deviceplugin.ActivePodsFunc(activePods), sourcesReady); err != nil {
        return err
    }
    return nil

}

Here, the active pod list and the source of pod metadata (FILE, URL, api-server) will be used as input to start the DeviceManager. These two parameters are not used at startup.

func (m *ManagerImpl) Start(activePods ActivePodsFunc, sourcesReady config.SourcesReady) error {
    glog.V(2).Infof("Starting Device Plugin manager")

    m.activePods = activePods
    m.sourcesReady = sourcesReady

    // Loads in allocatedDevices information from disk.
    err := m.readCheckpoint()
    if err != nil {
        glog.Warningf("Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: %v", err)
    }

    socketPath := filepath.Join(m.socketdir, m.socketname)
    os.MkdirAll(m.socketdir, 0755)

    // Removes all stale sockets in m.socketdir. Device plugins can monitor
    // this and use it as a signal to re-register with the new Kubelet.
    if err := m.removeContents(m.socketdir); err != nil {
        glog.Errorf("Fail to clean up stale contents under %s: %+v", m.socketdir, err)
    }

    s, err := net.Listen("unix", socketPath)
    if err != nil {
        glog.Errorf(errListenSocket+" %+v", err)
        return err
    }

    m.server = grpc.NewServer([]grpc.ServerOption{}...)

    pluginapi.RegisterRegistrationServer(m.server, m)
    go m.server.Serve(s)

    glog.V(2).Infof("Serving device plugin registration server on %q", socketPath)

    return nil
}

StartThe main core does two things:

  • m.readCheckpoint() Responsible for obtaining the registered and allocated device information from the local checkpoint (/var/lib/kubelet/device-plugins/kubelet_internal_checkpoint), why do you want to do this? This is mainly because Kubelet is responsible for the allocation and management of equipment, and this information only exists in the memory of Kubelet. Once the Kubelet is restarted, which devices have been allocated, and which Pod the allocated devices are specifically associated with

DeviceManager records the mapping relationship between Pod and device in json format to a local file after each device is allocated to Pod.

  • go m.server.Serve(s) Start the grpc service in the background grouting mode, so that the registration of the Device Plugin can be completed. We will introduce how the services opened by grpc interact with the Device Plugin later.

summary:

Reading open source code can help us improve our technical level, not only can we go deep into the underlying principles of technology, and quickly understand the technical architecture; it can also help us learn excellent code styles and design patterns. This article is just an introduction to the Device Manager initialization scenario. We will continue to study other scenarios in the future to deepen the understanding of the Device Plugin mechanism of Kubernetes.

Guess you like

Origin blog.csdn.net/zhangge3663/article/details/108290030