YOLO series of deep learning of (a)

Learning Paper: YOLO: Unified, Real-Time Object Detection

The core idea of
the core idea is to target detection YOLO into a regression, using the entire map as the input of the network, only through a neural network to obtain a category position bounding box (bounding box) and belongs.
Here Insert Picture Description
Implementation process

  1. The image is divided into an S * S grids (grid cell), if the center of an object falls on the grid, then the grid is responsible for forecasting the object.
  2. To predict the B each grid bounding box, bounding box for each of the prediction (x, y, w, h) and a total of five confidence values.
  3. Each grid also a prediction type information, referred to as the C class.
  4. In general, S S grids, each grid to predict the B bounding box, but also to predict the C class. Output * S * is S (. 5 B + C) is a tensor.

YOLO network structure

  1. Layer 24 using a convolution layer, two layers fully connected layer.
  2. Draws GoogLeNet classified network structure, but not identical. (Such as different layers)
  3. Convolution layer for image feature extraction, image used to predict the layer fully connected positions and category probability value.
    Here Insert Picture Description
    Objective loss function
    given first loss of function prototype paper:
    Here Insert Picture Description
    the objective loss function can be seen:
  4. It consists of three parts, namely: the coordinates of the prediction, the prediction and class prediction confidence.
  5. Using a sum of squares error, because it is easy to optimize, but it is not fully in line with our goal of maximizing the average accuracy. If it is equal to classification error and positioning error, it may not be desirable. In order to increase the penalty for positioning error, the second coefficient is 5.
  6. In each image, a number of grid units contains no objects. This is the "confidence" push fraction of these cells to zero, often exceed gradients of cells containing the object. This could lead to instability model, leading to early training divergence. In order to reduce the sentence of the box does not have the confidence of the object, so that the third coefficient of 0.5.
  7. It is noted here, when carrying out w and h are calculated error taking the square root thereof, it is the reason bbox prognosis size, compared to the large partial bbox prediction point, a little more partial prediction small box intolerable. The sum-square error loss is the same for the same offset loss. To alleviate this problem, the authors used a relatively tricky way, the box is the square root of the width and height instead of the original height and width.

Performance Analysis
advantages:

  1. YOLO detection system very fast. For real-time detection, the standard version of the system can handle YOLO 45 images per second; YOLO speed version of the image 150 may be processed. This means YOLO can be less than 25 milliseconds latency, real-time processing of video. For the less real-time systems, in the case of guaranteed accuracy, YOLO faster than other methods.
  2. mean Average Precision YOLO real-time detection is twice that of other real-time monitoring system.
  3. Migration capability, can be applied to other new areas (such as art object detection).
    Limitations:
  4. YOLO of objects close to each other, as well as a small population of poor detection results, it is predicted only a grid frame 2, and only the same class.
  5. YOLO is learning from data predicted bounding boxes, therefore, generalization performance targets for unusual angles weak.
  6. Since the problem of the loss function, is the main influence positioning error detection results, in particular the size of the object treated, to be strengthened. (Because for small bounding boxes, small error greater impact)

Conclusion
YOLO as a separate target detection method based on neural network model, can have high accuracy, rapid detection features, but also has certain robustness can be applied to real-time target detection.

Published 12 original articles · won praise 13 · views 616

Guess you like

Origin blog.csdn.net/wjinjie/article/details/104289368