[Super easy installation] Use conda to install high version cuda (cuda-11.8) and pytorch2.0 on a linux cluster server


Because the project code requires pytorch2.0 version, and pytorch2.0 version requires cuda11.8, which is higher than my previous cuda version of 11.0.
Therefore, consider using conda to create a new virtual environment and use higher versions of cuda and pytorch in it.

0. Background analysis

I am using lunix multiplayer cluster server. The cluster server needs to use the job scheduling system, that is, the bsub command can be used to submit jobs and run tasks. In this kind of multiplayer server, if you want to view the original cuda version, directly entering nvidia-smi on the command line is invalid, and the error nvidia-smi: command not found will be reported. How to check the cuda version in this case? You can read my previous article [nvidia-smi: command not found] How to use nvidia-smi to view GPU information on a cluster server

Looking at the upper right corner, the cuda version of the GPU is 11.0, which is lower than the cuda11.8 required by pytorch2.0, so consider installing a new higher version of cuda. (I was also confused before, whether 11.0 is the highest version supported by this GPU, but it seems not. You can download a higher version yourself)
Insert image description here

1. Create a new conda virtual environment

In order not to affect other versions of cuda, create a new virtual environment first. What is installed here is the python3.10 version

conda create -n env_name python==3.10

Where env_name is the name of the virtual environment, enter y during the installation process, and then use the conda activate env_name command to enter the virtual environment.

2. CUDA11.8 installation

There are many tutorials on the Internet, and they are also very complicated. But I suddenly discovered that there is a one-click command to download the cuda version package on the conda official website, so I used it with the intention of giving it a try.
Insert image description here
You can filter the required cuda version from the label, then copy the download command and run it directly on the command line.
After installation, enter nvcc -V to view the corresponding version of cuda.
Insert image description here

3. torch2.0 installation

download

Here you choose to download the .whl installation package. Select the corresponding version of torch onthis website. I chose the first one, which is torch2.0 version + cuda11.8 + python3.10, and it is a linux system. (win_amd64 refers to the windows system)
Insert image description here
Right-click and select copy link, and then in the previously installed conda environment, enter wget + link to download. Such as

wget https://download.pytorch.org/whl/cu118/torch-2.0.0%2Bcu118-cp310-cp310-linux_x86_64.whl#sha256=4b690e2b77f21073500c65d8bb9ea9656b8cb4e969f357370bbc992a3b074764

Install

After downloading, use pip install installation package name.whl to install

pip install torch-2.0.0+cu118-cp310-cp310-linux_x86_64.whl 

Check whether the query is successful

Enter python to enter the python environment, enter torch.__version__ to query

python
torch.__version__

The result is as shown in the figure:
Insert image description here
At this point, cuda and torch are installed in the new environment, and no errors are reported when running the project code.

Guess you like

Origin blog.csdn.net/a61022706/article/details/134903686