Debug Pytorch: RuntimeError: CUDA error: device-side assert triggered

报错信息

RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [31,0,0], thread: [100,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [30,0,0], thread: [162,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [32,0,0], thread: [290,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.

解决

  1. 检查label中是否有-1,或者label中有大于num_classes的数。label更新无误后可解决问题

  2. 其他解决方法,尝试运行时加上:

CUDA_LAUNCH_BLOCKING=1 python train.py 

联系方式

公众号搜索:YueTan

猜你喜欢

转载自blog.csdn.net/weixin_38812492/article/details/112848539