Faster R-CNN understanding of anchor

RPN

  First, RPN is how elected candidate region? See above diagram. After extracting the network VGG entire feature image, we use a 3x3 sliding window slides in the image, for each position, we also predict the k different region proposals, so that the upper left corner of the output classification layer comprises 2k (indicating is in the probability of the object), the upper right corner of the frame with a back layer 4k outputs (region represented by the coordinates of the four correction amount to the coordinate groundtruth). Wherein proposals are substantially the k k-th position in the picture frame inside a corresponding object.
  Secondly, why a convolution out of the 256-d vector can correspond to the k anchors? Personal understanding, anchor boxes the size of our man of the fixed (anchor boxes corresponding to the size of the original image, which is the central point is the central point anchor boxes on the original image corresponding to the schematic diagram in blue is that little), but in fact neural networks and did not make k-th coordinate anchor boxed in as a neural network parameters. Instead, the neural network in the training process, based on each of which corresponds to anchor the k 4 * k boxes learn the amount of correction coordinates (coordinate correction amount in accordance with "anchor boxes corresponding to the coordinates of the anchor and groundtruth object coordinates, the deviation amount "of both the resulting loss to learn). (If misunderstood, but also look critics pointed out in the comments, thank you!)

Published 40 original articles · won praise 44 · views 90000 +

Guess you like

Origin blog.csdn.net/Site1997/article/details/79327265