Understanding bottom-up and top-down in the target detection network

When looking at the paper on target detection network, a set of comparative vocabulary appeared: bottom-up and top-down. After checking some information and combining personal understanding, I got the opinion:
top-down: as the name implies, it is carried out from top to bottom. Originally derived from the pedestrian detection framework, in the pedestrian detection, the pedestrian target is first detected to obtain the bounding box, and then the key points of the human body are detected in the bounding box, and each person's posture is connected. Applied to the target detection network, it is to obtain the approximate boundary of the target first, and then further determine the position of the target, such as RepPoints, and determine the target boundary through deformable convolution.
Insert picture description here

Bottom-up: From the bottom up, after the image is extracted to the feature map, the network first determines the edge extreme points or corner points of the target, and then determines the detection target by defining whether these points belong to the same target, and obtaining the boundary of the target, such as CornerNet (Upper left corner point, lower right corner point), ExtremeNet (upper, lower, left, right extreme point + center point).
Insert picture description here

Insert picture description here

Guess you like

Origin blog.csdn.net/qq_44442727/article/details/114692401
Recommended