GPU server environment configuration stepping on the pit

Today, when I purchased a Tencent Cloud server, I accidentally chose the wrong environment, so I had to reinstall the system. However, the conda environment was not installed in the reinstalled system, so I need to follow it manually: blogger chooses to install
miniconda

CONDA installation

Download Miniconda3

wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install Miniconda3

chmod 777 Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh

It is worth noting that when asked whether to initialize Miniconda3 at the end, choose "no".
This prevents environment conflicts.
insert image description here

Use the vim tool to edit the .bashrc file

This step is to modify the environment variable

vim ~/.bashrc

Enter the installation directory of miniconda3 in the bottom line as an environment variable, which is the same as the installation directory saved above. This article is "/home/ubuntu/miniconda3"

export  PATH="/home/ubuntu/miniconda3/bin:"$PATH

Enter the command to make the .bashrc file take effect

source ~/.bashrc

At this point, just run the conda command.

DETR environment configuration

The previous conda environment configuration will not be described in detail, mainly because there is a problem when configuring the cuda operator later:
execute the following commands in sequence:

CUDA operator configuration

cd models/dino/ops
python setup.py build install
python test.py

An error occurred while executing test.py.

Traceback (most recent call last):
  File "test.py", line 18, in <module>
    from functions.ms_deform_attn_func import MSDeformAttnFunction, ms_deform_attn_core_pytorch
  File "/home/cse305/code/vidt-main/ops/functions/__init__.py", line 9, in <module>
    from .ms_deform_attn_func import MSDeformAttnFunction
  File "/home/cse305/code/vidt-main/ops/functions/ms_deform_attn_func.py", line 18, in <module>
    import MultiScaleDeformableAttention as MSDA
ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

The blogger is very strange about this issue. When the blogger clearly uses cuda-11.4, why does 10.2 appear?
Solution: first delete the build folder originally compiled in ops, and execute it in the terminal

export CUDA_HOME=/usr/local/cuda-11.4 #具体版本查看自己电脑的路径

Of course, it can also be modified directly in the .bashrc file.
Then run it again, and report an error when downloading the resnet weight.
insert image description here
Here we can choose to download manually, and put the downloaded weight file into the following path

windows:C:\Users\peng\.cache\torch\checkpoints (peng是你的电脑用户名)
linux:/home/ubuntu/.cache/torch/hub/checkpoints/

Switch to the linux path and execute the download command.

sudo wget https://download.pytorch.org/models/resnet50-0676ba61.pth

But here the blogger found that the download speed on the server was really too slow, and he didn't understand, but the local download speed was very fast, so the blogger decided to upload it to the server after the local download was completed.

insert image description here

Then, the configuration process is complete.

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/130011141