报错:
Traceback (most recent call last):
File "train_0723.py", line 434, in <module>
main()
File "train_0723.py", line 430, in main
train_net(args)
File "train_0723.py", line 424, in train_net
epoch_end_callback=epoch_cb)
File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 569, in fit
self.forward_backward(data_batch, eval_metric)
File "/home/user1/recognition/parall_module_local_v1_gluon_group.py", line 441, in forward_backward
eval_metric.update(data_batch.label[0], preds, )
File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/metric.py", line 363, in update
metric.update(labels, preds)
File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/metric.py", line 494, in update
pred_label = pred_label.asnumpy().astype('int32')
File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/ndarray/ndarray.py", line 2535, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/user1/miniconda3/lib/python3.7/site-packages/mxnet/base.py", line 255, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [17:41:29] src/resource.cc:443: Check failed: e == CUDNN_STATUS_SUCCESS (8 vs. 0) : cuDNN: CUDNN_STATUS_EXECUTION_FAILED
Stack trace:
[bt] (0) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6b41eb) [0x7f060fcc41eb]
[bt] (1) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x40c62fa) [0x7f06136d62fa]
[bt] (2) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x4634f7d) [0x7f0613c44f7d]
[bt] (3) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x463fe26) [0x7f0613c4fe26]
[bt] (4) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x464340c) [0x7f0613c5340c]
[bt] (5) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37d6909) [0x7f0612de6909]
[bt] (6) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37e33d5) [0x7f0612df33d5]
[bt] (7) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37bf6d1) [0x7f0612dcf6d1]
[bt] (8) /home/user1/miniconda3/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x37c2c10) [0x7f0612dd2c10]
解决:把batch_size调小点。
https://discuss.gluon.ai/t/topic/6309/8