Object Detection Tips

Table of contents

Object Detection Concept

Two types of target detection algorithms

Two stage target detection algorithm

One stage target detection algorithm

Model Evaluation Index

Common datasets for object detection

Common annotation tools for target detection

Object Detection Tools and Frameworks

Object Detection Concept

The task of Object Detection is to find out all the objects (objects) of interest in the image and determine their categories and positions, which is one of the core issues in the field of computer vision. Object detection has always been the most challenging problem in the field of computer vision due to the different appearance, shape and posture of various objects, coupled with the interference of factors such as illumination and occlusion during imaging.

In addition to image classification, the core problems to be solved in target detection are:

1. The target may appear anywhere in the image.

2. Targets come in various sizes. 

3. Targets may have various shapes.

Two types of target detection algorithms

Two stage target detection algorithm

First perform region generation (region proposal, RP), and then classify samples through a convolutional neural network;

Task: feature extraction --> generate RP --> classification/positioning regression

Common algorithms: R-CNN, SPP-Net, Fast R-CNN, R-FCN, etc.

One stage target detection algorithm

Instead of RP, features are extracted directly in the network to predict object classification and location.

Task: feature extraction --> classification/positioning regression

Common algorithms: OverFeat, YOLOV1, YOLOV2, YOLOV3, SSD and RetinalNet, etc.

Model Evaluation Index

1. TP: The overlap rate of label boxes is very high

2. FP: low overlap rate (no overlap) and duplicate detection frame

3、precision和recall

4. mAP, IoU and NMS

mAP (mean Average Precision) is a commonly used evaluation index in target detection, which is the average value of the area under the Precision-Recall curve (Area under the curve, auc). It can evaluate the detection effect of a target detection algorithm on multiple categories, and can comprehensively consider the two indicators of Precision and Recall. Generally speaking, the higher the mAP, the better the model detection effect.

IoU (Intersection over Union) is an important metric commonly used in target detection, which is used to measure the degree of overlap between the model detection box prediction and the real box. Simply put, IoU is the intersection area of ​​two boxes divided by their union area. If the IoU value is larger, the overlapping area of ​​the two boxes is larger, which means that the detection effect of the model is better.

NMS (Non-Maximum Suppression, non-maximum suppression) is a de-duplication technology in target detection. For the case where the same object may be framed by multiple frames, NMS can be used to achieve de-duplication. To put it simply, for multiple frames, select the largest one (that is, the smallest frame containing the object), and remove other frames with IoU values ​​higher than a certain threshold to reduce the error caused by repeated detection.

Common datasets for object detection

The COCO dataset ([ 1 ][ 2 ) is a large-scale image recognition, object detection, and segmentation dataset provided by Microsoft Corporation, which contains 80 object categories and more than 330,000 images. Among them, common animals, vehicles, daily necessities and other objects are included, and the position and size of each object in the image are marked. Its characteristic is not only the labeling of a single object, but also the occlusion between objects and the instance segmentation labeling of objects, and it needs to output information such as object category, bounding box and mask at the same time, so it is very suitable for training some advanced deep learning models , such as Mask R-CNN.

COCO's detection tasks contain a total of 80 classes. The data scales of train/val/test released in 2014 are 80k/40k/40k respectively. The more common division in academia is to use train and 35k val subsets as training sets ( trainval35k), use the remaining val as the test set (minival), and submit the result (test-dev) to the official evaluation server at the same time. In addition, COCO officials also retain part of the test data as the evaluation set for the competition.

The VOC dataset ([ 3 ][ 4 ) is a dataset provided by the Computer Vision Group at Oxford University, including 20 different object categories and more than 10,000 images. Different from the COCO data set, the VOC data set is a relatively small data set, but it is characterized by simple and clear labeling information, which is suitable for training some traditional machine learning algorithms and some relatively simple deep learning models, such as Faster R-CNN.

The VOC data set is a frequently used data set for target detection. Since 2005, the competition has been held every year. At the beginning, there were only 4 categories. By 2007, it was expanded to 20 categories. There are two commonly used versions: 2007 and 2012 . Academia often uses 5k train/val 2007 and 16k train/val 2012 as the training set, test 2007 as the test set, 10k train/val 2007+test 2007 and 16k train/val 2012 as the training set, and test2012 as the test set set and report the results separately.

Common annotation tools for target detection

labelme:

labelme is an open source image/video labeling tool, labels can be used for object detection, segmentation and classification. The inspiration comes from LabelMe, an open source labeling tool from MIT.

The characteristics of labelme are:

  • Components that support image annotation are: rectangle, polygon, circle, line, point (rectangle, polygons, circle, lines, points)
  • Support video annotation GUI customization
  • Support exporting VOC format for semantic/instance segmentation
  • Expenditures are exported in COCO format for instance segmentation

Object Detection Tools and Frameworks

mmdetection is an open source target detection framework based on PyTorch, developed by the OpenMMLab team of the Chinese University of Hong Kong. The framework supports a variety of target detection algorithms, including Faster R-CNN, Mask R-CNN, Cascade R-CNN, RetinaNet, SSD, FCOS, etc., and provides various commonly used backbone networks (backbone), head (head) And modules such as various loss functions (loss) and preprocessing methods are available.

The design of mmdetection is very flexible, and different types of object detection models can be trained, tested and deployed through simple configuration files. At the same time, the framework provides various auxiliary tools and APIs, which can be used for data set preparation, model optimization, model pruning, and model compression.

The code of mmdetection is open source on GitHub, with a large number of community contributors, and its performance has achieved state-of-the-art results on multiple public datasets. Therefore, mmdetection is one of the widely used object detection frameworks in academia and industry.

Reference link:

[1] COCO official website

[2] Detailed explanation of COCO data set and COCO-WholeBody

3] VOC dataset download and usage details

[4]The PASCAL VOC project

[5] Target detection index - mAP

[6] Target detection indicators: IoU and mAP

[7] Understanding and application of IoU and NMS in target detection

[8] mmdetection官网:GitHub - open-mmlab/mmdetection: OpenMMLab Detection Toolbox and Benchmark


 

[9] An overview of the mmdetection framework: You seem to have come to a wasteland where no knowledge exists-Know

Guess you like

Origin blog.csdn.net/qq_31807039/article/details/130553810