The use of nohup will understand a text server training programs run in the background without closing + training at any time by looking at the output and visualization visdom

The training program using nohup background + uninterrupted operation at any time to view the training output and visualization

Use server for remote training

Because I often need to use the server for training, and training some models it takes a long time, such as Mask model, it is often necessary to run a long-term program. Since I am using ssh login account when off the network or quit the program will be kill. The following summarizes the common remote training command, you can help train the model uninterrupted, and you can always see the visual results of the model.

nohup command and view the output

nohup is no hang up the acronym, just do not hang up the meaning.

nohup command: If you are running a process, and you feel that the process will not end when you exit your account, you can use the nohup command. This command can you withdraw from the account / continue running the corresponding process after closing the terminal.

By default the program all output is redirected to a file named nohup.out in.

The default output file written

 nohup python3 -W ignore -m xxx.py 

In such an approach, all output is saved to the nohup.out the directory , and read in real time, as shown in the ls view the files in the folder.
Here Insert Picture Description

The wording specified output file

nohup python -u xxx.py  > my_out_file.txt 2>&1 &

The xxx.py output will keep all the years to my_out_file.txt

And together with CUDA

One server, multiple servers can be GPU pieces for use, but this time only want to use the first two and four GPU, but we hope that there is still two GPU can see the code, numbered 0, 1, this time we can use environment variables CUDA_VISIBLE_DEVICES to solve this problem.
such as

CUDA_VISIBLE_DEVICES=1  只有编号为1的GPU对程序是可见的,在代码中gpu[0]指的就是这块儿GPU
CUDA_VISIBLE_DEVICES=0,2,3  只有编号为0,2,3的GPU对程序是可见的,在代码中gpu[0]指的是第0块儿,gpu[1]指的是第2块儿,gpu[2]指的是第3块儿
CUDA_VISIBLE_DEVICES=2,0,3  只有编号为0,2,3的GPU对程序是可见的,但是在代码中gpu[0]指的是第2块儿,gpu[1]指的是第0块儿,gpu[2]指的是第3块儿

CUDA preceding instruction

CUDA_VISIBLE_DEVICES=1 nohup python3 -W ignore -m xxx.py 

Check out real-time

Use vim nohup.out view the file.
To view the output in real time, in a file using the Shift+ Gkey combination to quickly jump to the end of the file.
Jump to a specific line: Direct :+ Digital

Use with visdom

Sometimes we use visdom the training results visualization, but the system will log out automatically visdom also closed, affecting viewing, so you can use directly nohup open visdom , so even if the server is out, next time you log can also be accessed on the specified port Visualization result.

nohup python -m visdom.server
Released two original articles · won praise 1 · views 13

Guess you like

Origin blog.csdn.net/weixin_46233323/article/details/104399179