Anaconda migrates deep learning virtual environment and configures it on the cloud server

1 anaconda virtual environment operation

1. View the virtual environment

conda info -e

2. Create a new virtual environment

conda create -n deeplearning_all pip python=3.6

3. Activate the newly created virtual environment

Conda activate  deeplearning_all

2 The version of the relevant library in the environment is the installation instruction (these libraries are all matched)

pip install numpy==1.16.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install scipy==1.4.1  #这个可以不装sklearn会帮忙装
pip install pandas==0.21.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install patsy==0.5.1
pip install scikit-learn==0.23.1 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install imbalanced_learn==0.5.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install statsmodels==0.11.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
# CUDA 10.1
pip install torch==1.8.1+cu101 torchvision==0.9.1+cu101 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install --no-cache-dir tensorflow-gpu==2.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
conda install absl-py==1.3.0

pip install keras==2.4.3 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install matplotlib==3.3.4 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install xgboost==0.90 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install lightgbm==3.1.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install bayesian-optimization==0.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

After that, if there is something missing, directly pip
ps: Check whether tensorflow/torch can mobilize the gpu

import tensorflow as tf
tf.config.list_physical_devices('GPU')
import torch # 如果pytorch安装成功即可导入
print(torch.cuda.is_available()) # 查看CUDA是否可用
print(torch.cuda.device_count()) # 查看可用的CUDA数量
print(torch.version.cuda) # 查看CUDA的版本号

3 Anaconda environment cloning and migration

Target host (the anaconda version of the windows system should be the same):
insert image description here
insert image description here

Install anaconda on the target host: download the installation package

3.1 Check the conda environment:

conda info --envs

insert image description here

3.2 Cloning the base environment

If you want to migrate the base environment, you need to clone it first (the base environment cannot be directly packaged)

conda create -n 新环境的名称 --clone 老环境名称

3.3 Install conda-forge and conda-pack tools

conda install -c conda-forge conda-pack

3.4 Packaging the environment

The files will be packaged in the C drive:/user/username folder by default

conda pack -n 新环境名称 -o 新环境名称.tar.gz

insert image description here

3.5 Put the compressed package in the envs folder under the Anaconda path of the same version of the target host

Unzip to the folder of the new environment under envs:

tar -zxvf 文件名 -C 文件夹名

3.6 Activate the environment

conda activate 新环境

The environment has been successfully migrated to the target host and is ready to use

4 Use Windows GPU cloud server to build a deep learning environment

4.1 Select the driver and related libraries, software version

Before installing the driver, you need to have a general understanding of the corresponding relationship between CUDA, cuDNN, Pytorch, TensorFlow, and Python versions, so that you can choose the appropriate version according to the actual configuration, and avoid subsequent problems such as version mismatch.
Select the CUDA driver version
CUDA (Compute Unified Device Architecture), which is a computing platform launched by the graphics card manufacturer NVIDIA. CUDA™ is a general-purpose parallel computing architecture introduced by NVIDIA that enables GPUs to solve complex computing problems. It includes the CUDA instruction set architecture (ISA) and the parallel computing engine inside the GPU.
1. Check the computing power of the graphics card
When choosing the CUDA driver version, you need to know the computing power of the (Tesla P40) graphics card used in this article. You can check the computing power of Tesla P40 graphics card is 6.1 through NVIDIA official website . As shown below:

(The target host is T4)

insert image description here

2. Select the CUDA version.
The relationship between the CUDA version and the computing power of the graphics card is shown in the figure below. For the Tesla P40 graphics card, the CUDA version above 8.0 should be selected. For more information about computing power and CUDA version.
insert image description here

Select the graphics card driver version
After confirming the CUDA version, select the graphics card driver version. You can refer to the corresponding relationship between CUDA and the driver as shown in the figure below to make a selection.

insert image description here
Choosing a cuDNN Version
NVIDIA cuDNN is a GPU-accelerated library for deep neural networks. It emphasizes performance, ease of use, and low memory overhead. NVIDIA cuDNN can be integrated into higher level machine learning frameworks such as Google's Tensorflow, UC Berkeley's popular caffe software. The simple plug-in design allows developers to focus on designing and implementing neural network models rather than simply tuning performance, while enabling high-performance modern parallel computing on GPUs.
cuDNN is a CUDA-based deep learning GPU acceleration library, only with it can the calculation of deep learning be completed on the GPU. If you need to run a deep neural network on CUDA, you need to install cuDNN, so that the GPU can work on the deep neural network, and the working speed is much faster than that of the CPU. For the corresponding relationship between cuDNN version and CUDA version, please refer to cuDNN Archive to
select the Pytorch version
. You need to select the corresponding Pytorch version according to the CUDA version. For matching version information, please refer to previous-versions .
Select the TesorFIow version
. Tensorflow is a little more complicated than Pytorch, and it also requires Python, compiler supported by the version of the device. The corresponding relationship between CPU, GPU version and Python, CUDA, cuDNN version is as follows:
TensorFlow version based on CPU version
TensorFlow version based on GPU version

Choose the optimal version here: CUDA 10.1, Python 3.6, Pytorch 1.8.1, Tensorflow_gpu_2.3.0

4.2 Operation steps

Install the graphics card driver
Use a browser to visit the official website of NVIDIA , and select the driver version of the graphics card. The configuration selected in this article is shown in the figure below:
insert image description here

After the download is complete, please double-click the installation package and follow the prompts on the page to complete the installation.
Install CUDA,
enter CUDA Toolkit Archive , and select the corresponding version. This article takes downloading version 10.2 as an example, as shown in the figure below:
insert image description here
Enter the "CUDA Toolkit 10.2 Download" page and select the corresponding system configuration. The configuration selected in this article is shown in the figure below:
insert image description here
Click Download to start the download (choose the latest version for CUDA 10.1).
4. After the download is complete, please double-click the installation package and follow the prompts on the page to install it. Among them, please pay attention to the following steps:
In the pop-up "CUDA Setup Package" window, the Extraction path is the temporary storage address, which does not need to be modified, keep the default and click OK. As shown below:
insert image description here

Configure environment variables
1 Select Run in the pop-up menu.
5. Enter sysdm.cpl in the Run window and click OK.
6. In the opened "System Properties" window, select the Advanced tab, and click Environment Variables. As shown below:
insert image description here

4. Select "Path" in "System Variables" and click Edit.
5. In the pop-up "Edit Environment Variable" window, create and enter the following environment variable configuration.

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin 
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\libnvvp
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib\x64
C:\Program Files\NVIDIA Corporation\NVSMI

After editing, it will look like the picture below:
insert image description here

6. Click OK 3 times in a row to save the settings.
Check graphics driver and CUDA
1. Select Run in the pop-up menu.
2. Enter cmd in the Run window and click OK.
3. In the cmd window:
Execute the following command to check whether the graphics card driver is installed successfully.

nvidia-smi

Returning to the interface as shown in the figure below indicates that the graphics card driver has been installed successfully. The figure below shows the running GPU. When the GPU is running, this command can check the usage of the GPU.
insert image description here

Execute the following command to check whether CUDA is successfully installed.

nvcc -V

Returning to the interface shown in the figure below indicates that CUDA is installed successfully.
insert image description here
Install cuDNN (see local documentation)
1. Go to cuDNN Download page, click Archived cuDNN Releases to view more releases.
2. Find the required cuDNN version and download it.
3. Unzip the cuDNN package, and copy the bin, include, and lib folders to the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2 directory.
4. Now the installation of cuDNN is completed.

Possible follow-up problems (continuously updated)

1. Solve Could not load dynamic library 'cudnn64_7.dll'; dlerror cudnn64_7.dll not found
Solution: Download the file to the C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA directory
2. [Python and tensorflow associated error reporting] ModuleNotFoundError: No module named 'termcolor', but pip3 show termcolor shows that the package already exists.
Solution: Uninstall and reinstall termcolor
3. An error is reported when importing tensorflow on NVIDIA Jetson Xavier NX: AttributeError: module 'wrapt' has no attribute 'ObjectProxy'
Solution: pip3 install wrapt ==1.11.1
Reference: Great God Article

Guess you like

Origin blog.csdn.net/weixin_46043195/article/details/128167232