[Computer Vision | Target Detection] Arxiv Computer Vision Academic Express on Target Detection (A collection of papers on August 30)

1. Detection related (10 articles)

1.1 Pseudo-Boolean Polynomials Approach To Edge Detection And Image Segmentation

Pseudo-Boolean polynomial edge detection and image segmentation method

https://arxiv.org/abs/2308.15453

We introduce a deterministic approach to edge detection and image segmentation formulating pseudo-Boolean polynomial image patches. The method is extracted from the provided image by applying a binary classification of spots and edge regions in the image based on degree calculation of pseudo-Boolean polynomials. We test our method on simple images containing original shapes of constant and contrasting colors and establish the feasibility before applying it to complex cases such as aerial landscape images. The proposed method is based on the developed reduction, polynomial degree, and equivalence properties of penalty-based pseudo-Boolean polynomials.

1.2 On the Robustness of Object Detection Models in Aerial Images

Research on the robustness of target detection model in aerial images

https://arxiv.org/abs/2308.15378

The robustness of object detection models is a major concern when applied to real-world scenarios. However, the performance of most object detection models degrades when applied to images subject to corruption, since they are usually trained and evaluated on clean datasets. Enhancing the robustness of object detection models is crucial, especially for those designed for aerial images, which are characterized by complex backgrounds with substantial changes in scale and orientation of objects. This paper discusses the challenges of robustness evaluation of object detection models in aerial images, with special emphasis on cases where images are affected by clouds. In this study, we introduce two new benchmarks based on DOTA-v1.0. The first benchmark includes 19 common corruptions, while the second focuses on cloud-corrupted images - a phenomenon uncommon in natural images but common in aerial photography. We systematically evaluate the robustness of mainstream object detection models and conduct extensive ablation experiments. Through our investigation, we found that enhanced model architecture, larger networks, carefully crafted modules, and smart data augmentation strategies jointly improve the robustness of aerial object detection models. Our proposed benchmarks and our comprehensive experimental analysis can facilitate research on robust object detection in aerial images. The code and dataset are available at: (https://github.com/hehaodong530/DOTA-C)

1.3 AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

AnomalyGPT: Detecting Industrial Anomalies Using Large-Scale Visual-Language Models

https://arxiv.org/abs/2308.15366

Large-scale visual language models (LVLM) such as MiniGPT-4 and LLaVA have demonstrated the ability to understand images and achieve remarkable performance in various vision tasks. Although they have strong ability to recognize common objects due to large training datasets, they lack specific domain knowledge and have a weak understanding of local details within objects, which hinders their application in industrial anomaly detection (IAD). effectiveness in the task. On the other hand, most existing IAD methods only provide anomaly scores and require manually setting thresholds to distinguish normal and abnormal samples, which limits their practical implementation. In this paper, we explore the use of LVLM to solve the IAD problem, and propose AnomalyGPT, a new IAD method based on LVLM. We generate training data by simulating abnormal images and generating corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantics and design a hint to learn to fine-tune the LVLM using hint embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustment, allowing direct assessment of the presence and location of anomalies. Furthermore, AnomalyGPT supports multi-turn dialogues and demonstrates impressive Few-Shot contextual learning capabilities. With only one normal shot, AnomalyGPT achieves state-of-the-art performance on the MVTec-AD dataset, with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3%. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.

1.4 Ego-Motion Estimation and Dynamic Motion Separation from 3D Point Clouds for Accumulating Data and Improving 3D Object Detection

Self-motion estimation and dynamic motion separation of 3D point clouds for accumulating data and improving 3D object detection

https://arxiv.org/abs/2308.15357

The new 3+1D high-resolution radar sensor is increasingly important in 3D object detection in the automotive sector because it is relatively cost-effective and offers better detection performance compared to traditional low-resolution radar sensors. One limitation of high-resolution radar sensors compared to lidar sensors is the sparsity of the generated point clouds. This sparsity can be partially overcome by accumulating radar point clouds for subsequent time steps. This article analyzes the limitations of accumulating radar point clouds on the Delft View dataset. By employing different self-motion estimation methods, the inherent constraints of the dataset, and possible solutions are analyzed. Furthermore, a learning-based instance motion estimation method is deployed to investigate the impact of dynamic motion in accumulated point clouds on object detection. Experiments demonstrate that target detection performance is improved by applying self-motion estimation and dynamic motion correction methods.

1.5 Detect, Augment, Compose, and Adapt: Four Steps for Unsupervised Domain Adaptation in Object Detection

Detection, enhancement, synthesis and adaptation: four steps for unsupervised domain adaptation in object detection

https://arxiv.org/abs/2308.15353

Unsupervised domain adaptation (UDA) plays a crucial role in object detection when adapting the source trained detector to the target domain without annotated data. In this paper, we propose a novel and effective four-step UDA method that leverages self-supervision and trains source and target data simultaneously. We leverage self-supervised learning to alleviate the lack of ground truth in the target domain. Our method consists of the following steps: (1) identify the region in each target image with the highest confidence detection set, which serves as our pseudo-label; (2) crop the identified region and generate a set of enhanced versions of it; (2) 3) combine these latter into synthetic images; (4) use the synthetic images to adapt the network to the target domain. Through extensive experiments across cameras, across weather, and synthetic to real-world scenarios, our approach achieves state-of-the-art performance, improving on average average precision (mAP) by more than 2% over recent competitors. The code is available at https://github.com/MohamedTEV/DACA.

1.6 MSFlow: Multi-Scale Flow-based Framework for Unsupervised Anomaly Detection

MSFlow: An Unsupervised Anomaly Detection Framework Based on Multiscale Flow

https://arxiv.org/abs/2308.15300

Unsupervised anomaly detection (UAD), where only anomaly-free samples are available for training, has attracted considerable research interest and driven a wide range of applications. Some UAD applications aim to further locate abnormal areas without any abnormal information. Although the case of anomalous samples and annotations deteriorates UAD performance, a humble but powerful statistical model, normalized flow, is appropriate for anomaly detection and localization in an unsupervised manner. Flow-based probabilistic models, trained only on anomaly-free data, can effectively distinguish unpredictable anomalies by assigning a much lower likelihood than normal data. However, the unpredictable size variation of anomalies introduces another inconvenience for high-precision anomaly detection and localization of flow-based methods. To generalize the anomaly size changes, we propose a new multi-scale flow-based framework called MSFlow which consists of asymmetric parallel flows and then exchanges multi-scale views by fused flows. Furthermore, different multi-scale aggregation strategies are adopted based on the difference between image anomaly detection and pixel anomaly localization. The proposed MSFlow is evaluated on three anomaly detection datasets and significantly outperforms existing methods. Notably, on the challenging MVTec AD benchmark, our MSFlow reaches the state-of-the-art with a detection AUORC score of 99.7%, localization AUCROC score of 98.8%, and PRO score of 97.1%. Reproducible code is available at https://github.com/cool-xuan/msflow.

1.7 ADFA: Attention-augmented Differentiable top-k Feature Adaptation for Unsupervised Medical Anomaly Detection

ADFA: attention-enhanced distinguishable top-k features for adaptive unsupervised medical anomaly detection

https://arxiv.org/abs/2308.15280

The scarcity of annotated data, especially for rare diseases, limits the variability of training data and the range of detectable lesions, posing significant challenges to supervised anomaly detection in medical imaging. To solve this problem, we propose a new unsupervised medical image anomaly detection method: attention-augmented differentiable top-k feature adaptation (ADFA). The method utilizes the Wide-ResNet 50-2 (WR 50) network pre-trained on ImageNet to extract initial feature representations. In order to reduce the channel dimension while retaining relevant channel information, we adopt the function of attention-enhanced patch descriptor extraction. We then apply differentiable top-k feature adaptation to train patch descriptors to map the extracted feature representations to a new vector space, enabling efficient anomaly detection. Experiments show that ADFA outperforms state-of-the-art (SOTA) methods on multiple challenging medical image datasets, confirming its effectiveness in medical anomaly detection.

1.8 A Comprehensive Augmentation Framework for Anomaly Detection

A comprehensive and enhanced anomaly detection framework

https://arxiv.org/abs/2308.15068

Data augmentation methods are often integrated into the training of anomaly detection models. Previous methods mainly focused on replicating real-world anomalies or enhancing diversity, without taking into account the variation of anomaly criteria between different classes, which may lead to biased training distribution. This paper analyzes the key characteristics of simulated anomalies, with aids in the training of reconstruction networks and condenses them into several methods, thereby creating a comprehensive framework by selectively leveraging appropriate combinations. Furthermore, we combine this framework with reconstruction-based methods while proposing a The split training strategy alleviates the overfitting problem while avoiding the introduction of interference in the reconstruction process. Evaluations conducted on the MVTec anomaly detection dataset show that our method outperforms previous state-of-the-art methods, especially in object classes .To evaluate the generalizability, we generate a simulated dataset that includes anomalies with different characteristics, since the original test samples only include specific types of anomalies and may lead to biased evaluations. Experimental results show that our method shows good potential to effectively generalize to various unforeseen abnormal situations encountered in the real world.

1.9 Few-Shot Object Detection via Synthetic Features with Optimal Transport

Few-Shot Object Detection Based on Optimal Transmission Synthetic Features

https://arxiv.org/abs/2308.15005

The purpose of Few-Shot target detection is to use limited training samples to simultaneously locate and classify targets in images. However, most existing Few-Shot object detection methods focus on extracting features of a few samples of new classes that lack diversity. Therefore, they may not be sufficient to capture the data distribution. To address this limitation, in this paper we propose a new approach in which we train a generator to generate new classes of synthetic data. However, directly training the generator on novel classes is ineffective due to the lack of novel data. To overcome this problem, we exploit large-scale datasets of base classes. Our first goal is to train a generator that captures data changes in the underlying dataset. We then transform the captured changes into new classes by generating synthetic data and training the generator. To encourage the generator to capture variations in the underlying classes of the data, we recommend training the generator with an optimal transmission loss that minimizes the optimal transmission distance between the distributions of real and synthetic data. Extensive experiments on two benchmark datasets show that the proposed method outperforms the state-of-the-art. Source code will be available.

1.10 Using deep learning for an automatic detection and classification of the vascular bifurcations along the Circle of Willis

Automatic detection and classification of vascular branches in circle of Willis based on deep learning

https://arxiv.org/abs/2308.15088

Most intracranial aneurysms (ICA) occur in a specific part of the cerebral vascular tree called the circle of Willis (CoW). More specifically, they appear primarily on 15 of the major arterial bifurcations that make up this circular structure. Therefore, for efficient and timely diagnosis, it is crucial to develop some methods that can accurately identify each bifurcation of interest (BoI). The fact that automated extraction of bifurcations presents a higher risk of developing ICA will provide neuroradiologists with a quick glimpse into the areas of greatest concern. Due to recent efforts in artificial intelligence, deep learning has proven to be the best performing technique in many pattern recognition tasks. Furthermore, various methods have been specially designed for medical image analysis purposes. This study aims to help neuroradiologists promptly locate any bifurcations that are at high risk for developing ICA. It can be viewed as a computer-aided diagnosis scenario where artificial intelligence facilitates access to regions of interest within MRI. In this work, we present a method for a fully automatic detection and identification of bifurcations of interest formed by the Circle of Willis. Several neural network architectures have been tested and we thoroughly evaluate the bifurcation recognition rate.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/132752268