Recently I built a Tesla P4. Since it is a half-height card, I simply stuffed it into my NAS. I tried to migrate the model that originally ran on the CPU with onnx to the GPU. I encountered some problems, and I will record it here Down.
Address of this article: blog.lucien.ink/archives/534
0. Preface
Since it is used in a production environment, all my services (including the model to be migrated this time) are running in containers, so I also consider using NVIDIA Docker 2 for production deployment this time.
My OS is Debian 11 x64, running on i3-12100.
For readers of this article, I assume that you have installed Docker. If not, you can refer to the Docker Getting Started Notes .
1. Environment deployment
1.1 Download driver
Go to Official Drivers | NVIDIA to download the Tesla P4 driver, remember not to select CUDA Toolkit Any
, otherwise it will give you a very old driver, which will affect the installation of nvidia docker (CUDA >= 11.6).
Since the actual operation is in docker, under the premise of stability, the CUDA version of the host machine should be as new as possible. Here I choose CUDA 12.0 version.
1.2 System preparation
1.2.1 Disable Nouveau
This step is necessary because Nouveau is also a driver for NVIDIA GPUs, see nouveau - Wikipedia .
- Create a file
touch /etc/modprobe.d/blacklist-nouveau.conf
- Write the following in the file:
blacklist nouveau
options nouveau modeset=0
- Regenerate kernel initramfs
update-initramfs -u
- reboot
reboot
1.2.2 Install the compilation environment
ReferenceNvidia unable to find kernel source tree
apt install linux-headers-`uname -r` build-essential
1.3 Install the driver
chmod +x
Just run the driver installation program directly , so I won’t go into details.
1.4 Install NVIDIA Docker
According to my observation, NVIDIA Docker 2 is similar to a plug-in of Docker, so that native Docker can call NVIDIA Driver.
1.4.1 Add source
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
&& curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
1.4.2 Installation
apt update
apt install -y nvidia-docker2
systemctl restart docker
1.4.3 Testing
docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
If you can see nvidia-smi
the content of , it means the installation is successful.