实现princeton-vl/pose-hg-train的demo时遇到的问题

1.在运行th main.lua -expID test-run指令时,出现这样的问题:

cudnnFindConvolutionForwardAlgorithm failed:    2        convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA6,3,256,256 -filtA64,3,7,7 6,64,128,128 -padA3,3 -convStrideA2,2 CUDNN_DATA_FLOAT
/root/torch/install/bin/luajit: /root/torch/install/share/lua/5.1/cudnn/find.lua:483: cudnnFindConvolutionForwardAlgorithm failed, sizes:  convDesc=[mode : CUDNN_CROSS_CORRELATION datatype : CUDNN_DATA_FLOAT] hash=-dimA6,3,256,256 -filtA64,3,7,7 6,64,128,128 -padA3,3 -convStrideA2,2 CUDNN_DATA_FLOAT
stack traceback:
        [C]: in function 'error'
        /root/torch/install/share/lua/5.1/cudnn/find.lua:483: in function 'forwardAlgorithm'
        ...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:190: in function 'func'
        /root/torch/install/share/lua/5.1/nngraph/gmodule.lua:345: in function 'neteval'
        /root/torch/install/share/lua/5.1/nngraph/gmodule.lua:380: in function 'forward'
        /root/pose-hg-train/src/train.lua:45: in function 'step'
        /root/pose-hg-train/src/train.lua:103: in function 'train'
        main.lua:19: in main chunk
        [C]: in function 'dofile'
        /root/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
        [C]: at 0x00406670
通过查询,出现问题的原因是:GPU的内存不够。因此nvidia-smi查看gpu使用情况,

最终选择三个占用率不高的gpu来运行:CUDA_VISIBLE_DEVICES=2,5,6 th main.lua -expID test-run

问题解决。

猜你喜欢

转载自blog.csdn.net/chenyu19880302/article/details/84619930
今日推荐