The second target detection bomb --Fast RCNN

Fast RCNN

Fast RCNN network training VGG19 RCNN 9 times faster than, test speed 213 times, compared with SPP network training three times faster, test 10 times faster, and more accurate.

Introduction

At the time, the target detection training tasks are carried out in several stages model training, slow and elegant. Thus we propose a single-stage training mode in Fast RCNN, the object classification and a combination of location determination. Image processing a test phase only 0.3 seconds, and more precisely on the PASCAL VOC 2012 dataset, mAP was 66% (RCNN 62%).

RCNN deficiencies

  1. Multi-stage training. First, during the training RCNN candidate frame, in ConvNet trimming, using the log loss, then using the extracted feature ConvNet training the SVM, and then regression using bounding box bounding box position of the training.
  2. Training is time consuming and take up space. SVM bounding box and return characteristics are used in training, by ConvNet it exists on the disk, which takes up storage space for hundreds of G's.
  3. Target detection time consuming. Processing a picture to 47 seconds on the GPU.

SPP deficiencies

  1. Multi-stage training
  2. Features written to disk
  3. Can not update network parameters before the pool space pyramid layer convolution layer.

Fast RCNN advantage

Improved inadequate RCNN and SPP, and to improve the speed and accuracy, Fast RCNN has the following advantages:

  1. Compared RCNN, SPP, Fast RCNN have better detection quality (mAP).
  2. Single-phase training, the use of multi-tasking loss.
  3. Training process can update all network layer parameters.
  4. The extracted features are not cached on disk.

Fast RCNN structure

Network receives the entire image as an input and a plurality of candidate objects, a plurality of layers and a convolution layer on the maximum pool convolution wherein the image obtained, then RoI (region of interest) on pooled for each candidate object region to give a fixed length feature vectors, and then a series of input feature vectors fully connected layers, the final output is divided into two layers: 1. use softmax classification, regression 2 using bounding box coordinates candidate block.

Here Insert Picture Description

RoI pooling layer

RoI layer is actually a special case of SPP layer, SPP-Net will be used to map different pyramid size, and only the lower layer RoI pooled to samples of 7 × 7 wherein FIG.

SPP-Net Why not update the SPP layer in front of the convolution layer weights

SPP-Net during training, each training sample from a different image, thus causing very inefficient back-propagation. Why is it inefficient, because each RoI has a very large receptive fields, usually contains the entire image. As the former must deal with a large input entire receptive field, the training of the communication process (usually the whole image), thus creating a back-propagation inefficient.

Truncated Singular Value Decomposition

For the classification of the whole picture, the whole connection layer is less time consuming than the convolution layer, but for the detection of such detection tasks RoI, calculate fully connected to the layer takes account of the entire front half of the spread. Thus using truncated singular value decomposition to reduce the computation time fully connected layer. For the size of u × v weight matrix W, which is approximately equal to:
W The S. t V T W \approx U\Sigma_tV^T
left singular size U u × t, the diagonal matrix of size t × t, the size of the right singular V v × t. Thus reducing the original uv parameter t (u + v), if t is much smaller than min (u, v), then the performance will improve a lot.

Welcome interested in artificial intelligence junior partner public concern number: Machine craftsmen, from time to time publish something about artificial intelligence technology, news articles and more for beginners series.
Here Insert Picture Description

Guess you like

Origin blog.csdn.net/LXYTSOS/article/details/93412578