Artificial intelligence detailed notes: computer vision, target detection and R-CNN series YOLO series models

computer vision

Overview of Computer Vision : Computer vision is a discipline that uses computer algorithms and mathematical models to simulate and automate human vision.

The status of computer vision : Computer vision (CV), natural language processing (NLP), and speech recognition (SR) are listed as the three hot spots in the direction of machine learning.

Common tasks of computer vision : The following will introduce four common computer vision tasks from coarse-grained to fine-grained.

  • Image classification : Assign one or more labels representing categories to an image.
  • Object detection : Determine the class of an object in an image and its location, where the location is outlined with a box.
  • Image Semantic Segmentation : To determine the category of the object in the image and accurately outline its location, it is necessary to classify each pixel.
  • Image instance segmentation : On the basis of image semantic segmentation, it is necessary to further distinguish different instances of the same type of object, and also perform pixel-level classification.

Target Detection

Object Detection Overview

Overview of the target detection task : The target detection task is a task in computer vision, which aims to detect and locate the target object from an image or video, and give its category label.

The difficulty of the target detection task :

  • Objects vary widely in size';
  • The angle and posture of the object are not fixed;
  • Objects can appear anywhere in the image;
  • Objects can belong to multiple categories at the same time;
  • Objects are often truncated by occlusions.

The development history of the target detection model :

  1. R-CNN model (2014) : The RCNN model is a convolutional neural network model based on region proposals, which completes the target detection task through two stages of candidate region extraction and convolutional feature extraction.
  2. SPP-net model (He Kaiming at the end of 2014) : The SPP-net model is a convolutional neural network model that can process input images of any size, and realizes feature extraction and fusion of different scales through the spatial pyramid pooling layer.
  3. Fast R-CNN model (2015) : The Fast R-CNN model is a convolutional neural network model based on region proposals. It achieves end-to-end target detection through the RoI pooling layer, which is faster and more accurate than the RCNN model. high.
  4. Faster R-CNN model (2015) : The Faster R-CNN model is a target detection model based on a deep neural network. It achieves end-to-end target detection by introducing the RPN network, and has achieved great results in detection speed and accuracy. improvement.
  5. YOLO model (2016) : The YOLO (You Only Look Once) model is a real-time target detection model based on a deep neural network. It achieves fast detection by converting the target detection problem into a regression problem, and can process multiple targets at the same time.
  6. SSD model (2016) : The SSD model is a target detection model based on a deep neural network, which achieves fast detection by predicting the category and location information of the target at multiple levels.
  7. FPN model (2017) : The FPN (Feature Pyramid Network) model is a feature pyramid-based network structure that improves the performance of target detection and semantic segmentation by fusing features at different levels and retaining high-resolution information.
  8. Mask R-CNN model (2018) : The Mask R-CNN model is a target detection and instance segmentation model based on the Faster R-CNN model. By adding a segmentation branch, the pixel-level detection of object instances is realized on the basis of the detection branch. segmentation.
  9. IoU-Net model (2019) : The IoU-Net model is a target detection model based on the U-Net model, which achieves more accurate optimization of target detection by introducing an IoU loss function.
  10. GIoU-Net model (2019) : The GIoU-Net model is a target detection model based on the U-Net model, which achieves more accurate optimization of target detection by introducing a GIoU loss function.

The development history of the target detection task :

  • Early non-deep learning object detection focused on designing stronger features;
  • Deep learning object detection focuses on the design process of network structure, optimization method and loss function.

Other content of object detection :

  • Important requirements for object detection systems : accuracy and real-time performance.
  • Moving target detection : Target detection from changing sequence images, divided into target detection in static background and motion detection in dynamic background.

Traditional Object Detection Methods

R-CNN model

R-CNN model proposal : R-CNN (Region-based Convolutional Neural Network) is a classic target detection algorithm, proposed by Ross Girshick et al. in 2014.

The algorithm flow of R-CNN : The R-CNN model is mainly divided into three steps: region extraction, feature extraction and target classification.

  • R-CNN first uses algorithms such as Selective Search to extract a series of candidate regions from the image;
  • Then feature extraction is performed on each candidate region to obtain a fixed-dimensional feature vector. This feature extraction process is achieved by using a convolutional neural network (CNN) on the region proposals.
  • Finally, object classification is performed on each candidate region using a classifier such as support vector machine (SVM) or multi-layer perceptron (MLP).

Advantages of R-CNN : By using CNN for feature extraction, R-CNN can overcome the problem of manually designing features in traditional target detection algorithms, thereby improving the accuracy of detection.

Disadvantages of the R-CNN model : R-CNN is slow and cannot process large-scale image data in real time.

SPP-Net model

Proposal of the SPP-Net model : SPP-Net is a method for solving the problem of fixed-size output when a convolutional neural network (CNN) processes variable-size input images. SPP-Net was proposed by Kaiming He et al. in 2014. It is an improved algorithm based on the R-CNN target detection algorithm.

Features of SPP-Net : By introducing the idea of ​​spatial pyramid pooling, SPP-Net extracts features from input images of different sizes and converts them into fixed-length feature vectors. Specifically, SPP-Net adds a spatial pyramid pooling layer after the last convolutional layer, which can perform pooling operations on feature maps of different sizes, and splicing the pooling results of different sizes together, Form a fixed-length eigenvector.

Advantages of SPP-Net : It can extract features from input images of different sizes and convert them into fixed-length feature vectors, thus avoiding the problem of inconsistent output sizes of CNN networks.

Fast R-CNN model

Proposal of the Fast R-CNN model : Fast R-CNN is a target detection algorithm based on deep learning, proposed by Ross Girshick in 2015, and is an improved version of the R-CNN series of algorithms.

Features of Fast R-CNN :

  • RoI pooling : Fast R-CNN combines feature extraction and target classification in R-CNN into one network, and uses RoI pooling (Region of Interest pooling) to process candidate boxes of different scales. RoI pooling obtains fixed-length feature vectors for object classification and bounding box regression by segmenting each candidate box and max-pooling each segmented region.
  • Multi-task loss function : Fast R-CNN introduces a multi-task loss function, including target classification loss and bounding box regression loss, while considering the mutual influence between the two tasks. By minimizing the multi-task loss function, the performance of both object classification and bounding box regression can be optimized simultaneously.
  • Feature sharing : Fast R-CNN uses a feature sharing strategy throughout the network, that is, different candidate boxes share the same convolutional feature map, thereby reducing the amount of calculation and storage space.

Advantages of Fast R-CNN : It has faster training and testing speed, and can be trained and optimized end-to-end.

Faster R-CNN model

Proposal of the Faster R-CNN model : Faster R-CNN is a target detection algorithm based on deep learning. It was proposed by Shaoqing Ren et al. in 2015 and is another improvement of the R-CNN series of algorithms.

Improvements to the Faster R-CNN model :

  • Region proposal network : Faster R-CNN introduces a region proposal network (RPN) for generating candidate boxes. RPN is a special convolutional neural network that can extract features from an input image and output a series of coordinates and scores of candidate boxes. RPN can share the feature map of the convolutional layer and generate candidate boxes through anchor boxes of different scales and aspect ratios, thus achieving fast and accurate candidate box generation.
  • Network structure optimization : In order to further improve the accuracy and speed of target detection, Faster R-CNN has optimized the network structure. Specifically, Faster R-CNN fuses the target detection network in RPN and Fast R-CNN, and shares the feature maps of the convolutional layers, so that the entire network can be trained and optimized end-to-end. In addition, Faster R-CNN also uses RoI alignment (RoI Align) instead of RoI pooling, which can more accurately align the positional relationship between the candidate frame and the feature map.

Advantages of Faster R-CNN :

  • End-to-end training and optimization can be achieved, with faster detection speed and higher accuracy.
  • The region proposal network can share convolutional feature maps, which reduces the amount of computation and storage space, making the entire network more lightweight and efficient.

YOLO V1

The proposal of YOLO V1 : YOLO (You Only Look Once) is a target detection algorithm based on deep learning. It was proposed by Joseph Redmon et al. in 2015. It is an end-to-end target detection algorithm with fast detection speed and better accuracy.

Advantages of YOLO V1 : It can realize end-to-end training and optimization, has faster detection speed and better real-time performance, and is especially suitable for real-time application scenarios.

Disadvantages of YOLO V1 : Relatively low accuracy, especially in small target detection.

YOLO V2

The proposal of YOLO V2 : YOLO V2 is the second version of the YOLO series of target detection algorithms, which was proposed by Joseph Redmon et al. in 2016.

Advantages of YOLO V2 : While maintaining a fast detection speed, it has good detection accuracy, especially in multi-scale target detection.

Disadvantages of YOLO V2 : Compared with other target detection algorithms, the detection speed of YOLO V2 is still slow, and there are still certain challenges in dealing with small targets and occluded targets.

YOLO V3

The proposal of YOLO V3 : YOLO V3 is the third version of the YOLO series of target detection algorithms, which was proposed by Joseph Redmon et al. in 2018.

Guess you like

Origin blog.csdn.net/hanmo22357/article/details/131031916