object detection target detection reading record

Disclaimer: This article is a blogger original article, shall not be reproduced without the bloggers allowed. https://blog.csdn.net/qq_30362711/article/details/88420103

1, the beginning, the neural network in the form of a sliding window of detection of the target image, as overfeat. That is the principle on which images of different sizes with a sliding window, each window corresponding to a rate of the correct category of elements to be detected.

 

2, the following method is rcnn method that first use conventional image segmentation contour segmentation of images, then each divided portion of the correct rate is calculated using neural network, the method most papers thereof accusations that its speed is too slow.

bound box regression: rcnn in this method is used to draw the next lot of papers, that is, this method:

There example, a picture, a picture object and a rectangular column of the target block, which is called a rectangular bounding box, it is possible rcnn region proposed method gives the size and position of the rectangular frame is not optimal target rectangle column Therefore we need to adjust the rectangle in order to better target block column, then rcnn is to use feature-map cnn output as an input, of course, rectangular home position, the offset is calculated, adjusting the rectangular.

 

3, YOLO method:

If you say you can predict the immediate offset position of the box, so why can not I first assumed that the picture there is a rectangle, then I calculate the likelihood of this rectangle targeted, as well as where the box will be offset better position.

For example, a 100x100 picture, and now I am concerned about is that in order to 30x30-centered, whether the side length of the square which has a target of 10, how do I change this square you can better frame the goal? So then I trained a deep learning network that can determine the probability of the square, there are various types of target calculate whether there exists the probability of target, the target is calculated assuming that there is an offset to modify how the square can get the best rectangle live rectangular targets.

I assume I've been training this network, then later gave me a picture that I can judge at 30x30 as the center, a square of 10, where there is no goal.

While further improvements above, I now not only to determine that a square, I want a one-time judge many, many squares, while YOLO is such that he would advance a picture into 7,7,49 rectangle, and then calculate this 49 whether rectangular box in the possibility of live target, the target is framed what kind of target, how to adjust the framing square to better target?

 

4,Faster rcnn

I think faster rcnn and yolo nature principle is actually about the same, said the main difference on some of the details:

(1) The leakage is a characteristic diagram as input all, ultrafast network is only 3X3 size calculated as a characteristic map input box.

(2) the predicted leakage of each cell and the cell is a default prediction as two boxes, each of ultrafast mesh is a size of 3X3 prediction cell of three different sizes, three different aspect ratios total of nine frame.

(3) ultrafast mesh 9 9 blocks correspond fully connected network with the convolution kernel, and nine fully connected network corresponding to each cell is the same

Ultra-fast network some of the details:

(1) there will be cross-boundary (junction) between the boxes, a 1000X600 picture will generally produce 60x40x9 = 20000 Ge box, and remove the border have become after about 6000 to train

(2) Each training session with only a picture, take out half of the positive examples anchor, and half of the counter-example from a picture selected, such as 126 positive and negative examples, if positive examples are not enough to fill counterexamples

 

5,yolov2

1 about tainted improvements are the following:

(1) In order to increase the recall rate: recall rate is to provide a little more used to predict the box, and therefore learn fast network. 1 is a final output characteristic leakage even a layer fully connected layers, the final output 7X7X (number of class + 2X5). And the improved structure that is fast with rpn connected to network characteristics in FIG.

(2) The same leakage 1 will only be resolved in each cell category is valid, but the improvement is for each box must be calculated for each box which category.

(3) fast with each network cell prognosis 9 different size and shape of a box, the paper is solved in the end to improve the prediction several boxes, the box shape is preferably what proportion of the final paper boxes with 5 , biased in favor of tall, thin rectangle.

(4) calculating a leakage position box: First, if the image is 448x448, and finally into a 7x7 cell, then the size of each cell is 64x64, so the center coordinates of the center of the box cell is shifted to a plus 0-1 between 64 multiplied by the number. Anyway yolo2 is used in this calculation, although the paper says this approach and fast networks are different, but I think both are the same.

(5) yolo2 uses a darknet, the basic structure is the latest 1X1 network by dimension reduction that dimension, and specifically see the original paper

(6) multi-scale training, because the network is fast convolution operation, so much the pictures can, so when training the training can change the size of the picture all right.

(7) Fine-Grained Features: did not understand, copy others'

YOLOv2 input image size is 416 * 416, after five maxpooling get the size of 13 * 13 feature maps, and this feature map using convolution make predictions. 13 * 13 feature size versus detection object is sufficiently large, but for small objects need finer characteristic graph (Fine-Grained Features). Thus FIG SSD using multiscale feature to detect objects of different sizes, characterized in front finer FIG be used to predict small objects. YOLOv2 proposed a passthrough layer using a finer feature FIG.

      YOLOv2 utilized Fine-Grained Features size is 26 * 26 feature images (input maxpooling last layer), for Darknet-19 model, it is the size of 26 * 26 * characterized FIG 512. Similarly shortcut passthrough ResNet network layer to the front of the higher resolution picture shows the input characteristics, and connect it to the rear of the low resolution features in FIG. FIG foregoing feature dimension is twice the characteristic diagram of the back, passthrough layer extracted 2 * each partial region of the front layer 2, and then converted to channel dimensions for the feature 26 * 26 * 512, the passthrough layer after processing becomes a new feature 13 * 13 * 2048 (FIG feature size decreased 4-fold, four-fold increase while channles), which can be connected to the 13 * 13 * 1024 back together to form a feature 13 in FIG. 13 * FIG feature size * 3072, and based on this characteristic convolution make predictions FIG. In YOLO of C source, passthrough layer is referred reorg layer.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_30362711/article/details/88420103