Why nan find the problem, because the error does not become smaller, but growing
Guess: no initial weight load error caused growing, not training, re-training in the original basis weight, otherwise it will lead to error changed so much nan problem, based on: After the initial load weights resolved.
Extended: it may be because the network is not set initial weights of issue or adopt detectron2 own way, rather than the use of training caused centermask2