Linux runs the program in the background and trains the model (screen command)

The application screen runs in the background and trains the model

introduction

When we run programs or train models on remote servers or cloud servers, we always have a problem: we accidentally disconnect the network and then the training is disconnected. The training stops, and the mentality explodes on the spot, and the screen runs in the background. It solves this problem very well, and you can continue training after disconnecting from the Internet.

Instructions

1. Installation:
First determine whether screen is installed on the system.
Enter the following command in the terminal. If no error is reported, it has been installed.

which screen 

If an error is reported, enter the following command to install
the Ubantu system:

apt-get install screen

CentOS system

yum install screen

After entering which screen, the path is displayed to indicate successful installation.
Insert image description here

2. Create a screen window.
Enter the following command to create a screen window, where test1 is the name of this window and can be customized.

screen -s test1

After pressing Enter, you will enter this window.
If one window is not enough and you want multiple threads, you can copy a window through one operation.

在键盘上按 Ctrl+A,然后按下C

Then a new window will pop up, through which you can create multiple windows.
If you want to switch these windows,

在键盘上按 Ctrl+A,然后按下shift+引号

The terminal will display multiple windows you created. You can use the arrow keys to select up and down, and then press Enter to enter this window. This method can not waste resources. If you want to exit the screen and return to the main window for multi-thread training
Insert image description here
, enter:

Alt+A+D

Note: If your Alt+A+D is to take a screenshot, then ignore it. It is the same if you run it directly with screen. It is definitely not because I don’t know how to solve this problem.

If you want to delete it, enter the following command, where test1 is the window name you named earlier.

screen -s test1 -X quit

3. Recover after disconnection
As long as your cloud server does not shut down, your model will continue to train after the network is disconnected or you actively close the connection. When you connect again, you can run the following command to check the running status. The first window is opened by default. You can open the specified window through screen -r test1

screen -r

If the error "There is no screen to be resumed" cannot be restored, matching
means that the current screen is in the Attached state. Change it to the following code, where 45612 is the number of the scren, which will be displayed where the previous error was reported.

screen -d -r 46512

4. Others
You can use the following commands to view the current window information

 screen -ls

The result is as shown in the figure. Following numbers such as 16388 is your window name. You
Insert image description here
can view all shortcut keys in the following way.

Ctrl+A然后shift+问号

The result is as shown in the figure.
Insert image description here
Note: The following naming is used to delete all windows, which is more dangerous and is not used under normal circumstances.

screen -wipe

When connecting to a remote windows system, use the following command to train and run it in the background.

nohup python main.py

Guess you like

Origin blog.csdn.net/qq_43605229/article/details/124807674