CUDA production environment configuration notes under Debian

Recently I built a Tesla P4. Since it is a half-height card, I simply stuffed it into my NAS. I tried to migrate the model that originally ran on the CPU with onnx to the GPU. I encountered some problems, and I will record it here Down.

Address of this article: blog.lucien.ink/archives/534

0. Preface

Since it is used in a production environment, all my services (including the model to be migrated this time) are running in containers, so I also consider using NVIDIA Docker 2 for production deployment this time.

My OS is Debian 11 x64, running on i3-12100.

For readers of this article, I assume that you have installed Docker. If not, you can refer to the Docker Getting Started Notes .

1. Environment deployment

1.1 Download driver

Go to Official Drivers | NVIDIA to download the Tesla P4 driver, remember not to select CUDA Toolkit Any, otherwise it will give you a very old driver, which will affect the installation of nvidia docker (CUDA >= 11.6).

Since the actual operation is in docker, under the premise of stability, the CUDA version of the host machine should be as new as possible. Here I choose CUDA 12.0 version.

1.2 System preparation

1.2.1 Disable Nouveau

This step is necessary because Nouveau is also a driver for NVIDIA GPUs, see nouveau - Wikipedia .

  1. Create a file
touch /etc/modprobe.d/blacklist-nouveau.conf
  1. Write the following in the file:
blacklist nouveau
options nouveau modeset=0
  1. Regenerate kernel initramfs
update-initramfs -u
  1. reboot
reboot

1.2.2 Install the compilation environment

ReferenceNvidia unable to find kernel source tree

apt install linux-headers-`uname -r` build-essential

1.3 Install the driver

chmod +xJust run the driver installation program directly , so I won’t go into details.

1.4 Install NVIDIA Docker

According to my observation, NVIDIA Docker 2 is similar to a plug-in of Docker, so that native Docker can call NVIDIA Driver.

1.4.1 Add source

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

1.4.2 Installation

apt update
apt install -y nvidia-docker2
systemctl restart docker

1.4.3 Testing

docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

If you can see nvidia-smithe content of , it means the installation is successful.

3. Reference documents

Guess you like

Origin blog.csdn.net/xs18952904/article/details/128483705