[One article solution] CUDA and Pytorch have been installed but torch.cuda.is_available() is False

Problem Description

CUDA and Pytorch have been installed. However, the following Python script is executed, and the output is False:

import torch
print(torch.cuda.is_available())

There are many reasons for this problem, and the analysis of many articles is not comprehensive. When the blogger encountered this problem, he also continued to integrate scattered information on the Internet, wasting a lot of effort, so he wrote this article. If you also encounter this problem, this article is dedicated to helping you solve this problem through this article . If this article is helpful to you, I hope you can like it and encourage it.

Note that the problem solved in this article is that import torch does not report an error, but Pytorch and cuda are not correctly matched. If your import torch reports an error, it means that you have not installed Pytorch correctly, please refer to this link to install the appropriate Pytorch version.

For the convenience of explanation, this article takes CUDA 10.2 and PyTorch 1.11.0 (GPU version) as examples.

Overview: What's Causing the Problem

The reasons leading to this problem are mainly divided into the following four aspects, which will be summarized here first, and the specific methods will be developed later. Experienced students can directly conduct quick investigations based on this section to improve efficiency.

  1. The CUDA version is not compatible with the driver: The CUDA version must be compatible with the GPU driver.
  2. There is a problem with the path setting of the CUDA library: if the CUDA library path is not configured correctly, PyTorch will not be able to find the CUDA library file.
  3. The PyTorch version does not match the CUDA version: The installed PyTorch version must match the CUDA version.
  4. Compilation issues: If using precompiled PyTorch binaries, there may be a mismatch with the CUDA version.
  5. Conflicts: There may be other packages or libraries that conflict with PyTorch or CUDA.

Most of the problems can be solved by the first three steps.

Please troubleshoot in the order of 1-5 until the problem is solved.

Possibility 1: The CUDA version is not compatible with the driver

The driver is compatible with CUDA, which means that the GPU driver matches the installed CUDA version and can communicate and cooperate with the CUDA library correctly. You can follow the steps below to check whether the two are compatible:

1. Check the CUDA version.

Enter the following command in the terminal:

![nvcc -V](https://img-blog.csdnimg.cn/7aa95a08cc654edd83a126fd30b8953d.png)

or:

cat /usr/local/cuda/version.txt

insert image description here

For example, my CUDA version is 10.2.

2. Check the GPU driver version.

Enter the following command in the terminal:

nvidia-smi

insert image description here

Find the Driver Version on the first line, for example, mine is 440.44.

3. Check whether the two are compatible.

Driver compatibility with CUDA is very important because PyTorch and other CUDA-based libraries need to interact with GPU drivers to function correctly. If the driver is not compatible with CUDA, CUDA functions may not be used normally or errors may occur.

Check the versions of the two above to make sure they match. For example, the recommended driver version for CUDA 10.2 is NVIDIA driver version 440.33 or later, and mine is 440.44, so it's compatible.

The compatibility table of CUDA version and GPU driver (for Linux, please refer to the first and second columns, and for Windows, please refer to the first and third columns) is as follows, please compare according to your own version: Note: Refer to this link
insert image description here
.

4. Update the driver or change the CUDA version.

If the two do not match, you can choose to update the driver or change the CUDA version.

The appropriate driver version can be downloaded and installed by visiting the official NVIDIA website. Before installing the new driver, please make sure to uninstall the old driver first, and follow the installation guide for correct installation, which will not be expanded here. If there is no special requirement for the CUDA version, it is recommended to choose to change the installed CUDA version first.

Possibility 2: There is a problem with the path setting of the CUDA library

In simple words, if the CUDA library path is not configured correctly, PyTorch will not be able to find the CUDA library files. In this case, even if both CUDA and Pytorch are installed correctly, it will cause an error.

Please follow the steps below to add the CUDA path to the environment variable of the Linux system (the operation under the Windows platform is similar, here is Linux as an example for reference):

  1. Enter the following command in the terminal:
vim ~/.bashrc

That is, open the ~/.bashrc file with the vim text editor. You can also choose to open it with another text editor.

  1. Take vim as an example, press the letter i to enter the edit mode, and add the following lines at the end of the file:
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Among them, /usr/local/cuda is the installation path of CUDA. If your CUDA installation path is not /usr/local/cuda, please fill in your actual path.

This operation will add the CUDA bin directory to the PATH environment variable so that the system can find the CUDA executable file, and add the CUDA lib64 directory to the LD_LIBRARY_PATH environment variable so that the system can find the CUDA library file.

  1. Press ESC, then type: wq, and press Enter to save the ~/.bashrc file.
  2. In a terminal window, run the following command for the environment variable changes to take effect:
source ~/.bashrc

This command will reload the .bashrc file to make the added environment variables take effect.

  1. Check if the CUDA path is successfully added to the environment variable.

In a terminal window, run the following command:

echo $PATH
echo $LD_LIBRARY_PATH

The above command will display the current value of the environment variable. Check that the path to CUDA (/usr/local/cuda/bin and /usr/local/cuda/lib64) is included in the output to ensure the addition was successful.

If the output does not contain the CUDA path, or contains multiple CUDA paths, please continue to refer to the following steps.

  1. Troubleshoot environment variable issues.

If the current value of the environment variable is incorrect, the problem may be caused by other configuration files or environment variables. In addition to the ~/.bashrc file, there may be other configuration files that may also contain settings for the CUDA library path. Open these files and look for setting lines like PATH, LD_LIBRARY_PATH to determine if the /usr/local/cuda path is set in other files.

  1. Check other environment variables.

In addition to the LD_LIBRARY_PATH environment variable, there may be other CUDA-related environment variable settings that may cause the path /usr/local/cuda/lib64 to be added to LD_LIBRARY_PATH. Run the following command to check whether there are other environment variables related to CUDA:

env | grep CUDA

If there is, it may be caused by performing other CUDA-related operations, and you can clear it by referring to the next item.

  1. Clear old environment variables.

If you have ever installed other versions of CUDA or performed other CUDA-related operations, old environment variable settings may still be present on the system. These environment variables can be manually cleared and then reset to the correct CUDA library path.

First, run the following command:

env

This will list all current environment variables. Look for environment variable settings related to CUDA or older versions in the output, use the unset command to remove the specified environment variable. For example, assuming an environment variable named OLD_CUDA_PATH exists, you can delete it by running:

unset OLD_CUDA_PATH

Repeat this step for each environment variable that needs to be removed. Note that unless you are sure that there is a problem with the environment variable, do not delete the environment variable easily.

  1. Reload the configuration file.

Run command:

source ~/.bashrc

Note that if an error is reported at this time: /bin/lesspipe: 1: /bin/lesspipe: basename: not found, it may be because the PATH environment variable has been deleted, please refer to this link .

  1. Check the environment variables again.

Run the env command to check the environment variables again to ensure that the old environment variables have been successfully removed.

Possibility 3: PyTorch version does not match CUDA version

Some students may install Pytorch directly through the following instructions:

pip install pytorch
conda install pytorch

These instructions are inappropriate and may cause a mismatch between the installed PyTorch version and the CUDA version. Some students also visit the official website of Pytorch , copy and paste directly after seeing the installation instructions, without further checking the compatibility between Pytorch and CUDA.

insert image description here

The more recommended installation method is to visit Previous PyTorch Versions | PyTorch , find the Pytorch version that matches the CUDA version, and then copy and install.

insert image description here

Possibility 4: Compilation problem

I have not encountered this situation, nor do I recommend compiling PyTorch from source. If the problem is still not solved after checking this step, it is recommended to try to change the Pytorch version first (for example, change from conda installation to pip installation). If you still can't solve it, refer to the related articles on compiling PyTorch from source code.

Possibility 5: Package or library conflict

In order to avoid this situation, it is recommended to create a new conda environment and install Pytorch in the new conda environment.

write at the end

The above five points are summed up by bloggers based on personal experience and related articles on the Internet. If there are any additions, students are welcome to discuss in the comment area.

If this article has helped you, I hope you can like it or comment to support it. Your encouragement is the biggest motivation for me to continue creating.

おすすめ

転載: blog.csdn.net/qq_41112170/article/details/131191827