Run deep learning models on GPU servers

1 Question source

Recently , when running the deep learning crowd counting model on          the local Windows system, since the dedicated GPU memory of the NVIDIA GeForce GTX 1650 of the notebook NVIDIA graphics card is only 4 GB, it is impossible to set a larger batchsize for training, resulting in a long training time for the model and prone to memory failures. overflow, so consider renting a GPU server for model training. Since the GPU server has high configuration requirements and the price is too high, it is considered free-for-all. If the training task can be solved in a short period of time, you can participate in Tencent Cloud's new user 1 yuan experience for seven days, or buy a server that is billed by time.

        Since I used Anaconda of Windows for virtual environment management and model training before, I didn’t know how to operate it for a while after purchasing the GPU server ( Linux system), so I recorded it here.

2 solutions

2.1 Login to the server

        Since the GPU server needs to be logged in remotely before it can be used, consider using Xshell and Xftp for command line operations and file transfer operations.

        Both Xshell and Xftp can be downloaded from the official website of NETSARANG. The enterprise version needs to be paid for, and the home/school version can be used for free. The URL is: NetSarang Homepage CN - NetSarang Website

        Then drag to the bottom of the page and click "Free for Home/School" in the download column, as shown in the picture below.

        After filling in the necessary information, the download link will be sent to the mailbox for free, as shown in the figure below.

2.2 Manage the virtual environment

        Use the Xshell downloaded in the previous step to log in to the GPU server, and use Xftp for file transfer.

        After logging in to the GPU server, install miniconda3 through the command line, the steps are as follows:

        ① Download the miniconda3 installation package:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

        ② Install the downloaded installation package (just follow the prompts to add environment variables during the installation process):

sh Miniconda3-latest-Linux-x86_64.sh 

        ③ Execute the activation test under the bin folder of the miniconda installation directory (usually /root/miniconda3):

source activate

        PS: An error may appear here: bash: activate: No such file or directory, you only need to configure the environment variable.

sudo vim ~/.bashrc

        Then write the path of miniconda3/bin in the last line.

export PATH="/root/miniconda3/bin:$PATH"

        ④ Test whether the installation is successful.

conda env list

        If the bash virtual environment that comes with conda is displayed and no error is reported, it proves that the installation of miniconda is complete, and you can use it to manage the virtual environment.

2.3 Create a virtual environment

        Command to create a virtual environment:

conda create -n [env_name] [package_name] [python=2.7 or 3.6 et al] 

        Activate the virtual environment:

source activate [env_name]

        Exit the virtual environment:

source deactivate [env_name]

        Delete the virtual environment:

conda remove -n [env_name] --all

2.4 Configure mirror source

        Take Tsinghua mirror source as an example:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --set show_channel_urls yes

        Check out the successfully installed toolkits:

conda list

        View all virtual environments:

conda env list

2.5 Installation of cuda and cudnn (taking pytorch as an example)

        Here you can refer to my previous article: About the solution of the mismatch between the native CUDA running version and the driver version_What should I do if the cuda driver version does not match the running version ? Solution that does not match the driver version https://blog.csdn.net/m0_59705760/article/details/125757532

3 summary

        After performing all the above operations, you can run the deep learning model on the GPU server (Linux operating system). If the server configuration is objective, the speed will be significantly improved.

Guess you like

Origin blog.csdn.net/m0_59705760/article/details/128768125