Faster-ILOD、maskrcnn_benchmark训练coco数据集及问题汇总


loading annotations into memory...
Done (t=18.25s)
creating index...
index created!
number of images used for training: 31235
2022-06-05 03:17:10,191 maskrcnn_benchmark.trainer INFO: Start training
/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/structures/segmentation_mask.py:422: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  item = item.nonzero()
/home/earhian/anaconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:123: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [15,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [17,0,0] Assertion `t >= 0 && t < n_classes` failed.
/pytorch/aten/src/THCUNN/ClassNLLCriterion.cu:108: cunn_ClassNLLCriterion_updateOutput_kernel: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
Traceback (most recent call last):
  File "tools/train_first_step.py", line 232, in <module>
    main()
  File "tools/train_first_step.py", line 224, in main
    model = train(cfg, args.local_rank, args.distributed)
  File "tools/train_first_step.py", line 103, in train
    arguments,
  File "/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/engine/trainer.py", line 70, in do_train
    loss_dict,_,_,_,_ = model(images, targets)
  File "/home/earhian/anaconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/earhian/anaconda3/envs/maskrcnn/lib/python3.7/site-packages/apex-0.1-py3.7-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd
    **applier(kwargs, input_caster))
  File "/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 67, in forward
    x, result, detector_losses = self.roi_heads(features, proposals, targets)
  File "/home/earhian/anaconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py", line 27, in forward
    x, detections, loss_box = self.box(features, proposals, targets)
  File "/home/earhian/anaconda3/envs/maskrcnn/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/box_head.py", line 55, in forward
    loss_classifier, loss_box_reg = self.loss_evaluator([class_logits], [box_regression])
  File "/data3/cf/papercpde/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/box_head/loss.py", line 151, in __call__
    sampled_pos_inds_subset = torch.nonzero(labels > 0).squeeze(1)
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

在这里插入图片描述

原因是我训练的数据集是70，coco一共80类
coco.py中写的很清楚，之前没看哈哈，当训练基础类别为70时，first 70 categories对应1 ~ 79。将num_classes改成81即可成功运行

# first 40 categories: 1 ~ 44; first 70 categories: 1 ~ 79; first 75 categories: 1 ~ 85
# second 40 categories: 45 ~ 91; second 10 categories: 80 ~ 91; second 5 categories: 86 ~ 91
# totally 80 categories

Faster-ILOD、maskrcnn_benchmark训练coco数据集及问题汇总

猜你喜欢