False stuck when Tensorflow GPU training model

When I Tensorflow training ssd network target detection during the training, will be found false stuck. It is recorded, for your reference.

System:
Hardware i5-8500 ddr4 2666 8G memory gtx1070 (8G memory).
Software win10 64bit CUDA 10.0 (do not use 10.1) cudnn 7.x Tensorflow 1.15.0
do not sell the restaurant, the memory is the key
, of course, in this multi-architecture needs together with the system running. Indeed there may be other uncertainties, I just summed up his experience, everyone detours.
False stuck when Tensorflow GPU training model
I figure is the result of the above-described version of the software to run, there is a marked step of about 0.3 seconds. There are once stuck with 110 seconds. But this will be stuck with the system load relief, continue to recover.
False stuck when Tensorflow GPU training model
Generally, the CPU load is general, GPU general computing load (estimation task is not enough weight), but the GPU memory is almost full.
Because Pycharm was driving to work, there was the memory is full, prompt close Pycharm.
The author estimates need to sit exchange system memory, then you will get stuck. (Task Manager can not see)

We recommend starting with 16G of memory at least. If only 8G, can be replaced
CUDA 9 cudnn 7.x Tensorflow <1.13.0 (1.11.0 version I tried with the feasible. CUDA10.0 high version will reportedly call the library can not be found.)
This match, CPU usage is relatively high, GPU occupation in general, the load memory is full. (Task Manager to see)
test the same training mission. Approximately 1.2 seconds with a step.

In order to improve efficiency, now with CUDA10.0 Tensorflow1.15.0 memory upgrade to 16G
will still appear suspended animation, but will be significantly eased.
On Ubuntu I did not try, if you have experience, welcome to leave a message.

Guess you like

Origin blog.51cto.com/cfy10/2450184