retinanet network model structure

ps: I'm so busy now, I don't have time to update the blog. Now I will simply and intuitively talk about the retinanet network model structure.

The structure of retinanet of many blogs on the Internet is just as follows:

The structure between (a) and (b) here is relatively intuitive and understandable (at most, the change in the number of output layers of FPN in the actual process will be biased), but (b) and (c), (d) It seems clear, but for me as a beginner at the beginning, I have this idea. If there are two outputs of W*H*KA and W*H*4A behind one layer of an FPN output, then there are three in the figure. Layer, isn’t it necessary to have three W*H*KA and W*H*4A, a total of 6 outputs, and then write six for loss?

Later, when I debugged the code step by step and looked at it, I found that there are actually only two outputs. As shown in the figure, the output layer of FPN is three layers. 

(W0 * H0 + W1 * H1 + W2 * H2) * KAwith(W0 * H0 + W1 * H1 + W2 * H2) * 4A

That is, the result of the multidimensional matrix is ​​resized to 2 dimensions and then joined together.

Guess you like

Origin blog.csdn.net/qq_36401512/article/details/102729172