Actual Techniques - basic target detection algorithm introduced: CNN, RCNN, Fast RCNN and Faster RCNN

Personal blog navigation page (click on the right link to open a personal blog): Daniel take you on technology stack 

Every lost something, we want to have a way to quickly locate a lost property. Now, the target detection algorithm might be able to do it. Use object detection throughout a number of industries, from security monitoring, real-time traffic monitoring to the wisdom of the city. In simple terms, these technologies are behind the powerful deep learning algorithms.

In this article, we will further understand these used in target detection algorithm, first of all start from the RCNN family, for example RCNN, Fast RCNN and Faster RCNN.

1. The simplest way to solve the target inspection tasks (use of deep learning)

The figure is a typical example of how the work object detection algorithm description, the figures each object (whether or kite task), with a certain accuracy can be positioned out.

 

 

First of all we are saying is that the image target detection in the most versatile and easiest depth learning - convolutional neural network (CNN). I talk about the inner workings of CNN, first let's look at this picture.

 

 

An image input to the network, then it is transmitted to a plurality of convolutions and layers pooled. Finally, the output target belongs to the category, it sounds very direct.

Each input of the picture, we have a corresponding output category, this technology can detect images in a variety of goals? The answer is yes! Let's look at how to use a convolution neural network to solve the common problem of target detection.

1. First of all, we have the following picture as input:

 

 

2. After that, we will picture into multiple areas:

 

 

3. Each area will be seen as a separate picture.

4. these areas of the picture passed to CNN, they are assigned to different categories.

5. When we are assigned to each region corresponding to the category, and then combining them together, to complete the detection of the target original image:

 

 

Problem with this method is that the picture of the objects may have different aspect ratios and spatial location. For example, in some cases, the target object may occupy most of the picture, or very little. The shape of the target object may be different.

With these considerations, we need to split a lot of areas, it requires a lot of computing power. So in order to solve this problem, reducing divided regions, we can use the area-based CNN, which can be selected area.

2. Based on Convolution neural network area of ​​introduction

About 2.1 RCNN

And work on a lot of different areas, RCNN algorithm is proposed to create multiple bounding box in the image, check whether these borders contain the target object. RCNN selective search to extract them from a picture frame.

First, let's define what is a selective search, and how it is to identify the different regions. Composition target object usually has four elements: Change scale, color, texture (material), the area occupied. Alternative search determines the characteristics of the object in the picture, and highlight different features based on these regions. Here is a simple case of selecting search:

  • First a picture as input:

 

 

 

  • After that it will generate initial sub- division, the picture is divided into a plurality of areas:

 

 

 

  • Based on color, structure, size, shape, similar to the region merging into larger regions:

 

 

 

  • Finally, a final target position of the object (Region of Interest).

 

The step of detecting a target object by RCNN follows:

  1. We first take a pre-trained neural network convolution.
  2. According to the number of target categories to be detected, the last layer of network training.
  3. Get each image region of interest (Region of Interest), re-transformation of these areas, in order to comply with CNN allowed input size requirements.
  4. After obtaining these areas, we are training the support vector machine (SVM) to identify the target object and the background. For each category, we have trained a binary SVM.
  5. Finally, we train a linear regression model to generate a more accurate identification of the bounding box for each object.

Here we explain in concrete cases.

  • First, the following images as input:

 

 

 

  • After that, we will be in the region of interest using the above-selective search:

 

 

 

  • The input to these regions for CNN, and convolutional network:

 

 

 

  • CNN extraction features for each region by SVM these regions into different categories:

 

 

 

  • Finally, regression to predict the position of the bounding box of each area with the bounding box:

 

 

 

This is how RCNN detection target object.

2.2 RCNN problem

Now, we know how to help RCNN target detection, but this technique has its own limitations. RCNN a training model is very expensive, and more steps:

  • The selective searches, to extract the region of 2000 for each picture separately;
  • Each feature region extracted by CNN. Suppose we have N images, then the CNN feature is N * 2000;
  • Entire process has RCNN target detection using three models:
  1. Feature extraction for CNN
  2. For the target object discriminated linear SVM classifier
  3. Adjusted regression model bounding box.

These processes are combined, make RCNN slow down, usually with each new picture needs 40-50 seconds to predict, basically can not handle large data sets.

So, here we introduce another target detection technology to break through these limits.

3. Fast RCNN

3.1 Fast RCNN Profile

Want to reduce the computation time RCNN algorithm, what methods can be used? Can we use on each picture only once CNN to get all the key areas of concern it, instead of running 2000 times.

RCNN author Ross Girshick proposed an idea to run only once on CNN in each photo, and then find a way to calculate the area in 2000. In Fast RCNN, we will enter into the picture for CNN, will generate a corresponding traditional feature mapping. Using these maps, the region of interest can be extracted. After that, we use a cell layer will re-Rol all regions proposed amendments to a suitable size for input to the network is fully connected.

Simply put, the process comprising the following steps:

  1. Enter the picture.
  2. Is input to the convolutional network, it generates a region of interest.
  3. Rol cell layer using readjust these regions, and input to a fully connected network.
  4. In the top layer of the network with softmax output category. A linear regression using the same layer, the output of the corresponding bounding box.

Therefore, models and three different RCNN required, Fast RCNN model is only used while achieving a feature region extraction, classification, the bounding box generated.

Similarly, we also use the above image as a case, a more intuitive explanation.

  • First, the input image:

 

 

 

The image is transferred to a convolutional network, the return region of interest:

 

 

Thereafter, a layer on the Rol pool area, to ensure that each region of the same size:

 

 

Finally, these regions are passed to a fully connected network classification, and returns the bounding box and linear regression simultaneously with softmax layer:

 

 

3.2 Fast RCNN problem

But even so, Fast RCNN there are some limitations. It is also used as a selective search to find the region of interest, the process is usually slow. The difference is that with RCNN, Fast RCNN processing a picture takes about 2 seconds. But in the real large data sets, this rate is still less than ideal.

4.Faster RCNN

4.1 Faster RCNN Profile

Faster RCNN Fast RCNN is optimized version, the main difference is that both the method for generating a region of interest, using the Fast RCNN selective searches, using a Faster RCNN Region Proposal network (RPN). RPN feature map image as input, generating a series of object proposals, each with a corresponding score.

Here is the general process Faster RCNN work:

  1. Convolutional network input image to generate a feature map of the image.
  2. Region Proposal Network Application in the feature map, and returns the corresponding object proposals score.
  3. Rol pool application layer, the correction to all the proposals of the same size.
  4. Finally, the connection layer completely transferred to the proposals, the bounding box to generate the target object.

 

 

 

Region Proposal Network So exactly how does it work? First, the CNN come into Faster RCNN wherein mapping input, and then passes it to the Region Proposal Network. RPN uses a sliding window on these feature maps, each of the k th window generates anchor box having different shapes and sizes:

 

 

Anchor boxes bounding box is a fixed size, they have different shapes and sizes. For each anchor, RPN would predict two things:

  • The first is the anchor that is the probability of the target object (without regard to category)
  • The second anchor is adjusted to more appropriate target object's bounding box of the return amount

We now have a different shape, size, bounding box, pass them to the pooled Rol layer. RPN is treated, proposals may not be of the category. We can be cut for each proposal, so that they all contain the target object. This is the role Rol cell layer. It extracts anchor fixed size for each map feature:

 

 

Thereafter, the feature map passed to complete the connection layer, and predict the target classification bounding box.

4.2 Faster RCNN problem

So far, all we have discussed target detection algorithms are used to identify the target object area. Network is not a one-time overview of all images, but concerned about the multiple parts of the image. It will be two questions:

  • Algorithm needs to let the image go through several steps to extract all targets
  • Because there are multiple nested steps, the performance of the system will often depend on the level of performance of the previous step

5. The algorithm summary

The following table algorithm mentioned in this article summarized:

 

Attached Java / C / C ++ / machine learning / Algorithms and Data Structures / front-end / Android / Python / programmer reading / single books books Daquan:

(Click on the right to open there in the dry personal blog): Technical dry Flowering
===== >> ① [Java Daniel take you on the road to advanced] << ====
===== >> ② [+ acm algorithm data structure Daniel take you on the road to advanced] << ===
===== >> ③ [database Daniel take you on the road to advanced] << == ===
===== >> ④ [Daniel Web front-end to take you on the road to advanced] << ====
===== >> ⑤ [machine learning python and Daniel take you entry to the Advanced Road] << ====
===== >> ⑥ [architect Daniel take you on the road to advanced] << =====
===== >> ⑦ [C ++ Daniel advanced to take you on the road] << ====
===== >> ⑧ [ios Daniel take you on the road to advanced] << ====
=====> > ⑨ [Web security Daniel take you on the road to advanced] << =====
===== >> ⑩ [Linux operating system and Daniel take you on the road to advanced] << = ====

There is no unearned fruits, hope you young friends, friends want to learn techniques, overcoming all obstacles in the way of the road determined to tie into technology, understand the book, and then knock on the code, understand the principle, and go practice, will It will bring you life, your job, your future a dream.

Published 47 original articles · won praise 0 · Views 287

Guess you like

Origin blog.csdn.net/weixin_41663412/article/details/104854794
Recommended