Ubuntu20.04+3090ti+cudatoolkit=11.3+tensorflow-gpu=2.6+pytorch=1.10 Deep learning environment configuration stepping records can be migrated and referenced through configuration files

Ubuntu20.04+3090ti+cudatoolkit=11.3+tensorflow-gpu=2.6.2+pytorch=1.10.2 environment configuration

Recently, the lab has just configured an Nvidia 3090ti server for running experiments. After a few days of tossing, the environment of tensoflow and pytorch has finally been set up. Let me talk about the pits I have stepped on, hoping to help some partners who also need configuration.

Note: This blog is not a tutorial post, not a nanny tutorial, some steps are not recorded, so don't follow my instructions below to configure, just provide reference and instructions.

1. Basic conditions:

				CPU:Intel  i9 12900KF,
				GPU:微星3090ti 24GB显存,
				系统:Ubuntu 20.04
				显卡驱动: 510.54

2. The first pit:

The graphics card driver does not have to be installed the latest, nor does it have to install the driver from the official website.

Use the command to check the driver version of your graphics card:

nvidia-smi

The graphics card driver seems to be 450.** at first, and then it was upgraded to the current version 510.54

At first I downloaded the driver from the official website, and then installed it through the tty command, but an error occurred during the installation process, so I reinstalled the system, and then directly used the Ubuntu system package, so the process is relatively simple, and it turns out that there is no problem.

3. The second pit:

Installation of cuda and cudnn

Here you can see CUDA Version from the above picture: 11.6
The cuda version here is actually not the real version of the system's cuda. ​​My understanding is the highest version of cuda that can be supported by this driver.

nvcc -V

insert image description here
You can see through the nvcc -V command that the cuda version I installed is 11.0, as long as it is 11.0 or higher, it should be ok

The graphics card of the 3090 series must ensure that the cuda version is above 11.0, please pay attention here! ! !

If your nvcc -V command cannot output the cuda version, if you are going to install cuda here, I advise you not to install cuda first, but directly configure tensorflow and pytorch later. This is the second pit. In the subsequent configuration, you will find that cudatoolkit will be reinstalled in the conda virtual environment, so it is not necessary to install cuda, and it is completely possible to install it in the virtual environment . So if you see this, you can stop for a while, and don't need to worry about configuring cuda first, unless you have other needs for graphics cards.

4. tensorflow-gpu configuration

Here I am using Anaconda3 for environment configuration, so the following instructions are all carried out in the conda virtual environment.

During the tensorflow-gpu installation process, a cudatoolkit package will be installed. Before confirming the installation (Yes/No), be sure to check the version of tensorflow-gpu you have installed along with the cudatoolkit version installed, and ensure that it is above 11.0
The above are all the versions of cudatoolkit in conda that I have found

5. The third pit:

The tensorflow-gpu version under the conda command is too low, use pip to install a higher version
If you use the conda command to install, ie

conda install tensorflow-gpu

Under the version of python=3.6, it only supports up to 2.4.1. But tensorflow-gpu==2.4.1 is installed with cudatoolkit==10.1, which cannot be used for accelerated calculations in 3090ti. When you run the program later, you will find that the data will be placed in the video memory, but it cannot be calculated.

# 查看库中的各版本
conda search tensorflow-gpu

insert image description here
So here we are going to use pip to install, the pip library contains a higher version of tensorflow-gpu,

#先查看pip下tensorflow的版本
pip install tensorflow-gpu==

Install tensorflow-gpu==2.6.2 in python=3.6 environment

pip install tensorflow-gpu==2.6.2

Install tensorflow-gpu==2.8.0 in python=3.7 environment

pip install tensorflow-gpu==2.8.0

Then use conda to install cudatoolkit==11.3.1

conda install cudatoolkit==11.3.1

6. pytorch configuration

The pytorch configuration is the same as tensorflow. It should be noted that before installing the package, confirm that the cudatoolkit version used is greater than 11.0, otherwise install a higher version of pytorch.
Here python==3.6 environment, pytorch installed version 1.10.0 or above

Here I also encountered a problem that the minor version of python=3.6 is too low, and finally upgraded to python==3.6.13 to solve the minor problem, so try to use the latest current python version.

During this period, I also encountered other small problems, such as the package under python=3.7 can be used, but it cannot be used under version 3.6.

Summarize:

1. To install the deep learning framework, cudn can be installed in a virtual environment, or it can be configured without external configuration

2. In the process of installing tensorflow-gpu and pytorch on 3090ti, ensure that the version of cudatoolkit is greater than 11.0. If you are installing other packages, you must downgrade cudatoolkit carefully, and check to install a higher version.

3. The versions of some packages under conda are not high, and can be installed with pip


Finally, I give the yml configuration files in my two environments. Under the same 3090ti graphics card, you can view or install my configuration directly:

py36
python=3.6.13
tensorflow-gpu=2.6.2
tensorboard=2.6.0
keras=2.6.0
pytorch=1.10.2
scikit-learn=0.24.2
cudatoolkit=11.3.1
configuration file download

py37
python=3.7.13
tensorflow-gpu=2.8.0
tensorboard=2.8.0
keras=2.8.0
pytorch=1.11.0
scikit-learn=1.0.2
cudatoolkit=11.3.1
configuration file download

# 重现我的虚拟环境
conda env create -f environment.yml

I hope to help some friends, and avoid some pitfalls. If there is something wrong or unclear, please correct me and communicate~

Guess you like

Origin blog.csdn.net/weixin_42213421/article/details/124225950