Target detection: R-CNN principle

Papers Address:

1 selective search:    https://arxiv.org/pdf/1502.05082.pdf

2  r-cnn:                    https://arxiv.org/pdf/1311.2524.pdf

1 Overview:
In order to reduce the time consumption caused by sliding window, selective search methods employed to locate a candidate block containing the target, around a 2k, and then the original block of 2k according to this cut out, sequentially training CNN

[Selective search] Explanation: FIG advance locate the target may occur, i.e., the candidate region (Region Proposal). Using the image texture, edge, color information can be ensured to maintain a high recall rate (the Recall) in the case of the window select fewer (or even hundreds of thousands) of

 

Step 2:
2.1 on imagenet pre-train a classifier CNN
2.2 using selective search method of cutting out an image photos
2.3 cut out image resize to a uniform size
2.4 using pre-trained CNN perform fine-tune, the total output category N +1, 1 represents a background, this case requires a small LR, well trained save the CNN
2.5 2.4 steps to remove the last trained classification of CNN layers, the intermediate one-dimensional feature vector as the output, each of the candidate regions image after this CNN, the output feature vector storage to disk
2.5 feature vector as the input samples sequentially for each category to train a SVM (positive cases IoU candidate region and the real area of> = candidate region 0.3, the other as a negative example)
2.6 return loss training objectives of positional parameters box

 

box return loss to explain:

Hypothesis prediction model output D I (P), where = P (P X , P Y , P W , P H ) of the center coordinates of the candidate region width and height, G = (G X , G Y , G W , G H the center coordinates of width and height) as a true target of

 

1 wherein L2 regularization parameter values ​​are super-determined cross-validation.

2 Only IoU> = 0.6 only candidate regions involved in the calculation return loss

 

3 disadvantage
3.1 selective search process is slow
3.2 2k, there are large areas is repeated, redundant information
3.3 4 modules (selective search, CNN, SVM, Regression) are separated from each

Guess you like

Origin www.cnblogs.com/dxscode/p/11443374.html