Ubuntu20.04 installation rocm tutorial, AMD deep learning, 6800xt configuration pytorch, CUDA

official documentation

Corresponding torch download address

ROCm installation and configuration step on the pit

  • problems encountered
    • Install the ubuntu system to update the kernel, but under the premise of win and Ubuntu dual systems (may) lead to unsuccessful kernel update, the kernel I successfully installed is 5.13.39.
    • The reason for the unsuccessful kernel update is that I did not manually partition when I installed the ubuntu system, and I emptied the disk to install directly, so I still manually partitioned the system when installing the system.
    • Turn off bios secure boot and set ubuntu's boot boot as the first boot
    • navi6800xt (gfx1030) graphics card installation 5.0 and above
    • rocm5.0 or above supports navi graphics cards. If it is a previous generation card, you can install the 4.5 version series, because the torch official website has a compiled pytorch version, which can be directly installed in the local environment without docker image.
    • reboot after installation
    • (Update) One more thing, secure boot should be disabled at the beginning of the tutorial. I remember that when installing the rocm driver, there will be an acceptance of the license agreement, which is about gpu call permissions. If you do not disable secure boot, he It will let you set a password, and then I forget how to operate the bios, it is always very troublesome.

ROCm installation

This version is 5.1.0

sudo apt update && sudo apt dist-upgrade
sudo apt-get install wget gnupg2 
sudo usermod -a -G video $LOGNAME
echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf
echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf
echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf
sudo wget https://repo.radeon.com/amdgpu-install/22.10/ubuntu/focal/amdgpu-install_22.10.50100-1_all.deb
sudo apt-get install ./amdgpu-install_22.10.50100-1_all.deb 
sudo amdgpu-install --usecase=dkms
amdgpu-install -y --usecase=rocm

Configure environment and permissions

sudo usermod -a -G video $LOGNAME 

sudo usermod -a -G render $LOGNAME

echo 'ADD_EXTRA_GROUPS=1' | sudo tee -a /etc/adduser.conf

echo 'EXTRA_GROUPS=video' | sudo tee -a /etc/adduser.conf

echo 'EXTRA_GROUPS=render' | sudo tee -a /etc/adduser.conf

echo 'export PATH=$PATH:/opt/rocm/bin:/opt/rocm/profiler/bin:/opt/rocm/opencl/bin' | sudo tee -a /etc/profile.d/rocm.sh

verify

# 显示gpu信息
rocm-smi
image-20220411175711359
# 两项都显示gpu信息
/opt/rocm/bin/rocminfo
/opt/rocm/opencl/bin/clinfo
image-20220411175916534

The next step is how to use rocm to accelerate graphics card operations. There are two ways, the recommended way is 1

Method 1, the docker container runs

       First, install docker according to the following tutorial. The branch of Alibaba Cloud is recommended in the tutorial.

docker install

       After installation, download the image of pytorch or tensorflow. Both of these images have torch or tf installed, so you can use it directly. It seems that only rocm5.0 or above supports navi graphics cards, so it is recommended to use this method for navi graphics cards, because torch The latest version compiled on the official website supports rocm4.5.2, so if your graphics card supports it, you can choose your version from the torch official website, and he will give you the pip command, so that you can install it in the local environment without the need for a docker container. remote environment.
       After installing docker, you can download the image. Which one is needed? One of these two images (after decompression) is 27g and the other is 22g. Docker is installed in the relevant folder in the root directory by default, so /the size you need to specify when installing ubuntu is bigger

Download the Pytorch and TersonFlow images

sudo docker pull rocm/pytorch:latest

sudo docker pull rocm/tensorflow:latest

       After downloading, you can use it to docker imagesview the downloaded image, the fourth is

Create a Pytorch or TensorFlow container

       Here you can --rmdelete and save the container, docker start pytorchstart the container directly later, and then use it docker attach pytorchto enter the container . After entering, you can directly run the code including calling cuda.

# 如果下载的是pytorch的镜像就用这个命令
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video --name pytorch rocm/pytorch:latest
#如果是tensorflow就用这个命令
sudo docker run -it -v $HOME:/data --privileged --rm --device=/dev/kfd --device=/dev/dri --group-add video rocm/tensorflow:latest

       If you can use docker, the tutorial here is basically over. If you can't use docker then look down.
       After the above command is run, it will directly enter the created container. You can press ctrl+p+qto exit temporarily and open vscode. If you install vscode, you will be on Baiduremote-containers . Search and install in the plugin market .

image-20220411182155093



       After installation, click the selection in        the lower left corner of vscode Attach to Running Container


       and a new window will pop up, so that you can develop on the ide. Note: Some plugins are not enabled by default after vscode is connected to the container, so you need to install python and other supporting plugins to
       verify that there is no problem

       running the official example of rocm.

       Another method is to enter the container and configure the jupyter notebook remote connection without using vscode. Baidu, on the premise that the path is mapped, the jupyter service is enabled in the container, and the jypyter notebook can be run in the local browser of ubuntu, and the synchronization of files can be realized.

Method 2

Or directly create a new environment, and download and install it to the local python environment according to the downloaded torch version provided at the beginning of the article. This method is suitable for those who are not familiar with docker and the rocm version is below 4.5.2

import torch
torch.cuda.is_available()
# output = True 即可以调用gpu

This method currently only supports rocm4.5 (should), on the torch download page provided at the beginning of the article, go in and search for rocm, you can see that the supported versions and
other versions are not very good at present, you need to build your own pytorch version of the graphics card, I tried If the compilation fails a few times, you can directly use the docker container. Although it shows cuda.is_available()yes after installation True, the HIP compilation error will be reported when running the training. If you are a big man in this area, you can go to the deep learning section of the official document. There is an official git torch The method of compiling the source code, I failed anyway, if someone can compile it successfully, please send me the method hahaha.

Guess you like

Origin blog.csdn.net/qq_51403540/article/details/123951460