The application screen runs in the background and trains the model
introduction
When we run programs or train models on remote servers or cloud servers, we always have a problem: we accidentally disconnect the network and then the training is disconnected. The training stops, and the mentality explodes on the spot, and the screen runs in the background. It solves this problem very well, and you can continue training after disconnecting from the Internet.
Instructions
1. Installation:
First determine whether screen is installed on the system.
Enter the following command in the terminal. If no error is reported, it has been installed.
which screen
If an error is reported, enter the following command to install
the Ubantu system:
apt-get install screen
CentOS system
yum install screen
After entering which screen, the path is displayed to indicate successful installation.
2. Create a screen window.
Enter the following command to create a screen window, where test1 is the name of this window and can be customized.
screen -s test1
After pressing Enter, you will enter this window.
If one window is not enough and you want multiple threads, you can copy a window through one operation.
在键盘上按 Ctrl+A,然后按下C
Then a new window will pop up, through which you can create multiple windows.
If you want to switch these windows,
在键盘上按 Ctrl+A,然后按下shift+引号
The terminal will display multiple windows you created. You can use the arrow keys to select up and down, and then press Enter to enter this window. This method can not waste resources. If you want to exit the screen and return to the main window for multi-thread training
, enter:
Alt+A+D
Note: If your Alt+A+D is to take a screenshot, then ignore it. It is the same if you run it directly with screen. It is definitely not because I don’t know how to solve this problem.
If you want to delete it, enter the following command, where test1 is the window name you named earlier.
screen -s test1 -X quit
3. Recover after disconnection
As long as your cloud server does not shut down, your model will continue to train after the network is disconnected or you actively close the connection. When you connect again, you can run the following command to check the running status. The first window is opened by default. You can open the specified window through screen -r test1
screen -r
If the error "There is no screen to be resumed" cannot be restored, matching
means that the current screen is in the Attached state. Change it to the following code, where 45612 is the number of the scren, which will be displayed where the previous error was reported.
screen -d -r 46512
4. Others
You can use the following commands to view the current window information
screen -ls
The result is as shown in the figure. Following numbers such as 16388 is your window name. You
can view all shortcut keys in the following way.
Ctrl+A然后shift+问号
The result is as shown in the figure.
Note: The following naming is used to delete all windows, which is more dangerous and is not used under normal circumstances.
screen -wipe
When connecting to a remote windows system, use the following command to train and run it in the background.
nohup python main.py