[mmseg] debug mental journey: CUDA kernel errors might be asynchronously reported at some other API call ...

Table of contents

Error:

Solution steps:

1. Locating detailed error information

2. Possible error 1: num_class setting is incorrect

3. Error 2 may be reported: the model output size is wrong


Error:

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.

Solution steps:

1. Locating detailed error information

First of all, this kind of vague error message is difficult to locate the specific error code location, so we need to print a more detailed error report description to find the error code location. Add the following two lines to the beginning of the code:

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

At this time, the error report will be located at the actual location, and everyone can prescribe the right medicine to solve it.

2. Possible error 1: num_class setting is incorrect

Check whether the value corresponding to the label of your data set exceeds num_class. For example, for a data set with num_class=3, the label should be [0,1,2], and there should be no values ​​other than these three values. 

3. Error 2 may be reported: the model output size is wrong

Usually the input size of the model is 3 and will not change, but the channel of the output size should be the size of num_class. Therefore, it is the easiest to forget to change here, and it is me. . .

Remember to change the out_channel of the model to num_class! ! !

It's not easy to organize, welcome to one-click three links! ! !

Guess you like

Origin blog.csdn.net/qq_38308388/article/details/131046609