Installation and use of NVIDIA Container Runtime under Ubuntu

NVIDIA Container Runtime official website
GitHub repository : Docker is the most widely used container technology by developers. With the NVIDIA Container Runtime, developers can expose NVIDIA GPUs to applications in the container simply by registering a new runtime during container creation. NVIDIA Container Runtime for Docker is an open source project hosted on GitHub.

Introduction

NVIDIA Container Runtime is a GPU aware container runtime, compatible with the Open Containers Initiative (OCI) specification used by Docker, CRI-O, and other popular container technologies. It simplifies the process of building and deploying containerized GPU-accelerated applications to desktop, cloud or data centers.
NVIDIA Container Runtime is a GPU-aware container runtime that is compatible with the Open Containers Initiative (OCI) specification used by Docker, CRI-O, and other popular container technologies. It simplifies the process of building containerized GPU-accelerated applications and deploying them to the desktop, cloud, or data center.

With NVIDIA Container Runtime supported container technologies like Docker, developers can wrap their GPU-accelerated applications along with its dependencies into a single package that is guaranteed to deliver the best performance on NVIDIA GPUs, regardless of the deployment environment
. Container technologies such as Docker allow developers to package their GPU-accelerated applications and their dependencies into a single package that is guaranteed to deliver optimal performance on NVIDIA GPUs regardless of deployment environment.

Install

This article refers to the official installation documentation of NVIDIA Container Toolkit to install it in Ubuntu 22.04.

Environmental requirements

  • NVIDIA Linux driver is installed and version >= 418.81.07
  • Kernel version > 3.10 GNU/Linux x86_64
  • Docker >= 19.03
  • NVIDIA GPU with architecture >= Kepler (or Compute Capability 3.0)

start installation

  1. Set up package repository and GPG key
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
  1. Update and download and install nvidia-docker2
sudo apt-get update

update may report an error:

sudo apt-get update
E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg !=
E: The list of sources could not be read.

For solutions, see the official document Conflicting values ​​set for option Signed-By error when running apt update

sudo apt-get install -y nvidia-docker2
sudo nvidia-ctk runtime configure --runtime=docker
  1. Restart the Docker daemon and test
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

If you see output similar to the following, the installation is successful.
Insert image description here

Usage example

Refer to the official document User Guide

Add NVIDIA Runtime

Because nvidia-docker2 has been installed above, there is no need to add NVIDIA Runtime.

Set environment variables

Users can use environment variables to control the behavior of the NVIDIA container runtime - specifically the ability to enumerate GPUs and drivers.
These environment variables have already been set in the basic CUDA image provided by NVIDIA.

GPU enumeration

Use --gpus or use the environment variable NVIDIA_VISIBLE_DEVICES to control which GPUs the container can use

The value of NVIDIA_VISIBLE_DEVICES is as follows

Possible values

Description

0,1,2, or GPU-fef8089b

a comma-separated list of GPU UUID(s) or index(es).

all

all GPUs will be accessible, this is the default value in base CUDA container images.

none

no GPU will be accessible, but driver capabilities will be enabled. (All GPUs cannot be used, but driver capabilities are enabled.)

void or empty or unset

nvidia-container-runtimewill have the same behavior as runc(ie neither GPUs nor capabilities are exposed)

When using --gpu to specify the GPU, the device parameter should also be used. The example is as follows

docker run --gpus '"device=1,2"' \
    nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv

Enable all GPUs

docker run --rm --gpus all nvidia/cuda nvidia-smi

Use NVIDIA_VISIBLE_DEVICES to enable all GPUs

docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda nvidia-smi

Use NVIDIA_VISIBLE_DEVICES to enable specified GPUs

docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=1,2 \
    nvidia/cuda nvidia-smi --query-gpu=uuid --format=csv

Start a GPU enabled container on two GPUs

docker run --rm --gpus 2 nvidia/cuda nvidia-smi

Use nvidia-smi to query the GPU UUID and assign it to the container

nvidia-smi -i 3 --query-gpu=uuid --format=csv

uuid
GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24

docker run --gpus device=GPU-18a3e86f-4c0e-cd9f-59c3-55488c4b0c24 \
     nvidia/cuda nvidia-smi

Drive function

NVIDIA_DRIVER_CAPABILITIES controls which driver libraries/binaries are mounted into the container.
The values ​​of NVIDIA_DRIVER_CAPABILITIES are as follows

Possible values

Description

compute,video or graphics,utility

a comma-separated list of driver features the container needs.

all

enable all available driver capabilities.

empty or unset

use default driver capability: utility, compute(use default driver capability: utility, compute)

Supported driver functions are as follows

Driver Capability

Description

compute

required for CUDA and OpenCL applications.

compat32

required for running 32-bit applications.

graphics

required for running OpenGL and Vulkan applications.

utility

required for using nvidia-smi and NVML.

video

required for using the Video Codec SDK.

display

required for leveraging X11 display.

For example, specify compute and utility, two ways of writing

docker run --rm --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=2,3 \
    -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \
    nvidia/cuda nvidia-smi
docker run --rm --gpus 'all,"capabilities=compute,utility"' \
    nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

constraint

The NVIDIA runtime also provides containers with the ability to define constraints in configuration files

NVIDIA_REQUIRE_* is a logical expression used to define constraints on the software version or GPU architecture on the container. The following is the specific content of the constraints.

Constraint

Description

cuda

constraint on the CUDA driver version.

driver

constraint on the driver version.

arch

constraint on the compute architectures of the selected GPUs.

brand

constraint on the brand of the selected GPUs (e.g. GeForce, Tesla, GRID).

Multiple constraints can be expressed in a single environment variable: space-separated constraints are ORed (or), comma-separated constraints are ANDed (and). For example
NVIDIA_REQUIRE_CUDA "cuda>=11.0 driver>=450"

See the original text for more information

Dockerfile

This can be set via environment variables, for example

ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

Docker Compose

Refer to the tutorial in Docker official documentation

Compose v2.3 writing method
services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    runtime: nvidia

This way of writing cannot control the specific properties of the GPU.

More granular control
  • capabilities
    values ​​are specified as a list of strings (eg. capabilities: [gpu]). You must set this field in the Compose file. Otherwise, it will return an error when the service is deployed.
  • count
    is specified as a value of int or all indicating the number of GPU devices that should be reserved (assuming the host has that number of GPUs).
  • device_ids
    is specified as the value of a list of strings representing the GPU device IDs from the host. The device ID can be found in the nvidia-smi output on the host machine.
  • driver
    specified as a string value (e.g. driver: 'nvidia')
  • options
    represents a key-value pair of driver-specific options.

count and device_ids are mutually exclusive. You can only define one field at a time.

For more information about these properties, see the Compose Specification section in deploy.

For example, use all GPUs on the host and specified driver functions (although the value of NVIDIA_DRIVER_CAPABILITIES can be all, you cannot write all here, an error will be reported, you can only write each one clearly)

services:
  test:
    image: nvidia/cuda:10.2-base
    command: nvidia-smi
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [compute,graphics,video,utility,display]

For more setting examples, see the official documentation

Guess you like

Origin blog.csdn.net/qq_35395195/article/details/131431872