Deep Learning Remote Server Configuration

Deep Learning Remote Server Configuration

  In the scientific research tasks of deep learning, GPU and video memory are used to run many models. If the configuration of personal computers is limited, we usually run large-scale model codes on servers with higher computing power and better performance provided by the unit. Of course, before officially using the server, you need to apply for a personal account from the server administrator, and then configure the deep learning environment in your own working directory.

1. Install Anaconda on the server

  When we configure the Python environment for deep learning, we generally use the Anaconda package management tool (which comes with Python and many modules), which can be downloaded from Tsinghua's mirror website https://repo.anaconda.com/archive/index.html Install the corresponding version of Anaconda.

[External link image transfer failed, the source site may have anti-leech mechanism, it is recommended to save the image and upload it directly (img-hejI7u7g-1645891100958) (C:\Users\17209\AppData\Roaming\Typora\typora-user-images\ image-20220226204610495.png)]

Since most of the servers are Linux operating systems, the Anaconda3-2020.11-Linux-x86_64.sh released in 2020.11 is chosen to be installed here , and then the software package is sent to the personal directory of the server through a software transfer tool (Xftp5 is recommended).

Enter the installation instructions under the terminal command:bash Anaconda3-2020.11-Linux-x86_64.sh

Of course, you can also download and install directly through the bash command:

  • download: wget https://repo.continuum.io/archive/+下载的Anaconda版本;
  • Install:bash 下载的Anaconda版本
wget https://repo.anaconda.com/archive/Anaconda3-2020.11-Linux-x86_64.sh
bash Anaconda3-2020.11-Linux-x86_64.sh

Configure the PATH environment variable : enter the command in the terminal vi .bashrcto configure the environment

****

Add: at the end of the file export PATH=/home/trainingl/anaconda3/bin:$PATH, note that here /home/trainingl/anaconda3/binis my actual installation path.

Test the environment, check the version : After Anaconda is installed on the server, you can enter python in the terminal to check the current version

2. Configure Tsinghua mirror source

  When using the Linux operating system in a server environment, it is inevitable to download many packages that are not available locally. However, the download source server used by Linux is abroad, and the download speed will be much slower than that in China. Domestic server manufacturers such as Alibaba Cloud and Tencent Cloud are used daily. , the mirror source defaults to its own related mirror source, so the download speed of the application is very fast when using it. Here I chose the Tsinghua mirror source for configuration.

Add Tsinghua image package management [sequential execution]:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --set show_channel_urls yes
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

View added sources:conda config --show-sources

3. Create and manage virtual environments

First of all, we have to figure out why we need to create a virtual environment? What is the concept of virtual environment?

  From the actual scientific research background, we may face many different tasks, and these tasks have inconsistent requirements for the Python environment, and some packages or libraries may have conflicting versions. For example, PyTorch is sometimes used for deep learning tasks. , and TensorFlow is sometimes used, and we hope that these two libraries can run in two separate Python environments! Therefore, Anaconda has launched a package management tool for virtual environments. We can create different virtual environments in the env directory through the conda command. These virtual environments can install Python packages or modules that are suitable for different tasks. These environments can be switched and used at will. conflict will arise.

1. View all conda environments of the current system:conda env list

From the results, we can see that there are two virtual environments, among which base is a basic environment, which is a basic environment automatically generated by installing Anaconda, and the other named train is a virtual environment I created.

2. Create a virtual environment:conda create -n envName python=3.7

Description: Some basic library files will be installed when installing the virtual environment.

3. Activate the virtual environment:source activate envName

4. Exit the virtual environment:source activate

Note: Do not add the name parameter when exiting the virtual environment, directly source activate.

5. Delete the virtual environment:conda remove -n your_env_name(虚拟环境名称) --all

6. Delete unnecessary packages in the virtual environment:conda remove --name your_env_name package_name

7. Enter the existing virtual environment and check which packages are installed:conda list

4. Install the PyTorch deep learning framework

  When installing the PyTorch deep learning framework, you first need to know the CUDA version of the current server. Of course, it cannot be found in the CPU environment of the server, so the user needs to switch to the GPU environment . Generally, the server in the school will have multiple graphics cards, but you need to enter the corresponding GPU environment according to the usage rights of the graphics card, and enter the command to nvcc -Vview the current CUDA version. .

You can see that the current CUDA version is 9.0. Considering that PyG will be installed later, here I installed PyTorch1.7.1 and CUDA 10.1. Find the corresponding installation command on the PyTorch official website according to the version :

Note: When installing the corresponding version of PyTorch, be sure to switch to the corresponding virtual environment. In addition, some school servers can only be connected to the Internet in the cpu environment, while the gpu environment is in a non-network environment, that is, it is only used for computing, so after nvcc -Vchecking , switch to the cpu environment in time.

It takes a long waiting time to test whether the installation is successful. Switch the cpu to the gpu environment , and enter the gpu environment must enter the virtual environment where torch is installed for testing.

import torch
print(torch.cuda.is_available())

It can be found that pytorch 1.7 is successfully installed and CUDA can be used normally.

Guess you like

Origin blog.csdn.net/qq_41775769/article/details/123158609