After getting a new GPU cloud server, how to deploy the environment to train the model

After getting a new GPU cloud server, how to deploy the environment to train the model

If you put the model on colab, you don't need to deploy another environment, you can directly put the model and data set on the google hard drive and let colab carry the google hard drive, and then you can start training.

But for a brand new server without environment configuration, we need to deploy the environment ourselves to train the model.

1. Install Anaconda

1.1 Download the installation package

wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-2022.05-Linux-x86_64.sh

The last "/" is followed by the version of the anaconda installation package, which can be replaced with other installation packages. Query the page for the version of the installation package to be downloaded: https://repo.anaconda.com/archive/

1.2 Start download

bash Anaconda3-2022.05-Linux-x86_64.sh

Pay attention to the path location and version of the installation package. When executing this command, the current path happens to have this installation package

1.3 Check whether the installation is successful

conda -V

Note: V is uppercase.

This command can be executed in any path. If the version of conda is displayed, it means that anaconda has been installed successfully.

2. Install additional python third-party libraries according to the model

Since anaconda does not configure all python third-party libraries when the installation is complete, there are some library models that are required but not in the current environment, and additional installation is required. When I was experimenting with the model, I found that these libraries required additional installation.

Some libraries cannot be installed with the conda command, so try to install them with the pip command.

conda install scipy
pip install sklearn //用conda命令安装会失败
pip install torchsummary //用conda命令安装会失败

3. Start training the model

Then you can start training the model. Go to the path where the model py file is located, and just run the file (dataset configuration and other specific issues are related, and the specific solutions will be solved separately).

python XXX.py

I generally use the tee command to record each line of model training output into a txt file at the same time.

python XXX.py | tee record.txt

A line of output is recorded to a txt file at the same time.

python XXX.py | tee record.txt

Record each line of model training output to a record.txt file at the same time.

Guess you like

Origin blog.csdn.net/Mocode/article/details/127612123