A review of target detection models based on deep learning

An overview
- Main challenges in object detection
- Evaluation indicators
2 Outlook

An overview

Object detection is a natural extension of object classification, which only aims to identify objects in images. The goal of object detection is to detect all instances of a predefined class and provide its rough positioning in the image via axis-aligned boxes. The detector should be able to identify instances of all target classes and draw bounding boxes around them. This is often viewed as a supervised learning problem.

Main challenges in object detection

Computer vision has made great progress in the past decade, but it faces some significant challenges. Some of the key challenges that networks face in practical applications are:

Intra-category variation : The presence of intra-category variation between instances of the same object is relatively common in nature. This variation may be due to various reasons, such as: occlusion, lighting, posture, viewing angle , etc. These unrestricted external factors have a dramatic impact on the appearance of the object. Expect objects that may have non-rigid deformations, or be rotated, scaled, or blurred. Some objects may have inconspicuous surroundings, making extraction difficult.
Number of Classifications : The huge number of object classes available for classification makes solving this problem challenging. Additionally, it requires more high-quality annotated data, which is difficult to obtain. Training detectors with fewer examples is an open research problem.
Efficiency : Current models require high computing resources to generate accurate detection results. As mobile and edge devices become more common, efficient object detectors are crucial for further advancement in the field of computer vision.

Evaluation indicators

Object detectors use multiple criteria to measure detector performance, such as frames per second (FPS), precision, and recall. However, the average precision mean is the most common evaluation metric . The accuracy is derived from the intersection ratio, which is the ratio of the overlap area and the union area between the ground truth and the predicted bounding box. Set thresholds to determine if detection is correct. If the IOU exceeds the threshold, it is classified as a true positive, and if the IOU is below it, it is classified as a false positive. If the model fails to detect an object present in the true value of the label, it is called a false negative. Precision measures the percentage of correct predictions, while recall measures how true predictions relate to the true value.
Insert image description here

Insert image description here

2 Outlook

Object detection has made tremendous progress in the past decade. In some narrow areas, algorithms have almost reached human levels of accuracy, yet there are still many exciting challenges to solve.

AutoML (Automatic Machine Learning): Determining object detector features using automated neural architecture search (NAS) is an actively developing field. Search algorithms are complex and resource intensive.
Lightweight Detectors: Although lightweight networks have shown great potential by matching classification errors with full models, there is still a gap of more than 50% in detection accuracy. As more on-device machine learning applications come to market, the need for small, efficient, and equally accurate models will increase.
Weakly supervised/few-shot detection: Most state-of-the-art object detection models are trained on data annotated with millions of bounding boxes. Annotating data requires time and resources, and this training method does not scale. These costs can be significantly reduced by using weakly supervised data, i.e., image-level labeled data.
Domain transfer: Domain transfer refers to the application of a model trained on labeled images of a specific source task to a different but related target task. It encourages reuse of trained models and reduces dependence on the availability of large datasets to achieve high accuracy.
3D object detection: 3D object detection is a particularly important issue for autonomous driving. Although models have achieved high accuracy, deployment of sub-human-level performance will raise security concerns.
Object detection in videos: Object detectors are designed to perform on single images that lack correlation with each other. Exploiting the spatiotemporal relationship between frames for target recognition is an unsolved problem.