[Computer Vision | Target Detection] arxiv Computer Vision Academic Express on Target Detection (June 19 Collection of Papers)

1. Detection related (7 articles)

1.1 Vehicle Occurrence-based Parking Space Detection

Parking space detection based on vehicle occurrence

Paper address:

https://arxiv.org/abs/2306.09940

insert image description here
Smart parking solutions use sensors, cameras and data analytics to improve parking efficiency and reduce traffic congestion. In recent years, computer vision-based methods have been widely used to solve parking lot management problems, but most works assume that parking spaces are marked manually, which affects the cost and feasibility of deployment. To fill this gap, this work proposes an automatic parking space detection method that receives a sequence of images of a parking lot and returns a list of coordinates identifying the detected parking spaces. The proposed method employs instance segmentation to identify cars and, using vehicle occurrences, generates a heatmap of parking spaces. Results using 12 different subsets from the PKLot and CNRPark-EXT parking lot datasets show that the method achieves AP 25 scores as high as 95.60% and AP 50 scores as high as 79.90%.

1.2 Squeezing nnU-Nets with Knowledge Distillation for On-Board Cloud Detection

NNU-Net Compressed Spaceborne Cloud Detection Based on Knowledge Distillation

Paper address:

https://arxiv.org/abs/2306.09886

insert image description here
Cloud detection is a critical satellite imagery preprocessing step that can be performed on the ground and on satellite to label useful images. In the latter case, it could reduce downlink data volumes by pruning cloudy areas, or make satellites more autonomous through data-driven acquisition rescheduling. We accomplish this task using nnU-Nets, a self-restructuring framework capable of performing meta-learning of segmentation networks on a variety of datasets. Unfortunately, such models are usually memory inefficient due to their (very) large architectures. To benefit from on-board processing, we compress the nnU-network with knowledge distillation into a smaller and more compact U-network. Our experiments on Sentinel-2 and Landsat-8 images show that nnU-Nets provide state-of-the-art performance without any manual design. Our method achieved a Jaccard index of 0.882 on over 10,000 unseen Sentinel-2 images in the On Cloud N: Cloud Cover Detection Challenge (the winner achieved 0.897, the baseline U-Net with ResNet-34 backbone : 0.817, and the classic Sentinel-2 image threshold: 0.652). Finally, we show that knowledge distillation is able to craft smaller (almost 280 times) U-Nets compared to nnU-Nets, while still maintaining their segmentation capabilities.

1.3 MixedTeacher : Knowledge Distillation for fast inference textural anomaly detection

MixedTeacher: Knowledge Distillation for Fast Inference Texture Anomaly Detection

Paper address:

https://arxiv.org/abs/2306.09859

insert image description here

Unsupervised learning for anomaly detection has long been at the heart of image processing research and a stepping stone to high-performance industrial automation processes. With the advent of CNNs, several methods have been proposed, such as autoencoders, GANs, deep feature extraction, etc. In this paper, we propose a new approach based on the promising concept of knowledge distillation, which includes normal samples from the trained network (student), while considering the output of a larger pretrained network (teacher). The main contributions of this paper are two-fold: first, a simplified student structure with optimized layer selection is proposed, and then a new student-teacher structure with network bias reduction combining two teachers is proposed to jointly improve the performance of anomaly detection and its positioning accuracy. The proposed texture anomaly detector has an outstanding ability to detect any texture and fast inference time compared to SOTA methods.

1.4 Efficient Search and Detection of Relevant Plant Parts using Semantics-Aware Active Vision

Efficient search and detection of plant related parts based on semantic perception active vision

Paper address:

https://arxiv.org/abs/2306.09801

insert image description here
For automatic harvesting and defoliation of tomato plants using robots, it is important to search and detect the relevant plant parts, namely tomato, stem and petiole. This is challenging due to the high level of occlusion in tomato greenhouses. Active vision is a promising approach to help robots consciously plan camera viewpoints to overcome occlusions and improve perception accuracy. However, current active vision algorithms cannot distinguish between relevant and irrelevant plant parts, making them inefficient for object perception of specific plant parts. We propose a semantically active vision strategy that uses semantic information to identify relevant plant parts and prioritize them in view planning using an attention mechanism. We evaluate our strategy using 3D models of tomato plants of varying structural complexity, which closely represent real-world occlusions. We use simulated environments to gain insight into our strategies while ensuring reproducibility and statistical significance. At the end of ten views, our strategy was able to correctly detect 85.5% of the plant parts, about 4 parts per plant on average compared to the volume active vision strategy. Furthermore, it detected 5 and 9 parts compared to two predefined strategies, and 11 parts compared to a random strategy. A median of 88.9% of objects were correctly detected per plant across 96 experiments. Our strategy is also robust to uncertainty in plant and plant part locations, plant complexity, and different viewpoint sampling strategies. We believe that our work can significantly improve the speed and robustness of automated harvesting and defoliation in tomato crop production.

1.5 The Big Data Myth: Using Diffusion Models for Dataset Generation to Train Deep Detection Models

Big Data Myths: Using Diffusion Models to Generate Datasets to Train Deep Detection Models

Paper address:

https://arxiv.org/abs/2306.09762

insert image description here
Despite the remarkable achievements of deep object detection models, a major challenge that still exists is the need for a large amount of training data. The process of obtaining such real-world data is painstaking work, which has prompted researchers to explore new research avenues, such as synthetic data generation techniques. This study proposes a framework to generate synthetic datasets by fine-tuning a pre-trained stable diffusion model. The synthetic dataset is then manually annotated and used to train various object detection models. These detectors are evaluated on a real-world test set of 331 images and compared to baseline models trained on real-world images. The results of this study show that object detection models trained on synthetic data perform similarly to baseline models. In the context of apple detection in an orchard, the average accuracy deviation from the baseline ranges from 0.09 to 0.12. This study illustrates the potential of synthetic data generation techniques as a viable alternative to collecting extensive training data to train deep models.

1.6 Scaling Open-Vocabulary Object Detection

Scaled Open Vocabulary Object Detection

Paper address:

https://arxiv.org/abs/2306.09683

insert image description here
Object detection with open vocabularies has greatly benefited from pretrained visual-language models, but is still limited by the amount of training data available for detection. While detection training data can be extended by using web image-text pairs as weak supervision, this has not been done at a scale comparable to image-level pre-training. Here, we augment the detection data with self-training, which uses existing detectors to generate pseudo-box annotated image-text pairs. The main challenges of scaling self-training are the choice of label space, pseudo-annotation filtering and training efficiency. We propose the OWLv 2 model and OWL-ST self-training formulation to address these challenges. OWLv 2 has surpassed the performance of previous state-of-the-art open-vocabulary detectors at a comparable training size (about 10 million examples). However, with OWL-ST, we can scale to more than 1B examples, resulting in a further huge improvement: Using the L/14 architecture, OWL-ST increases the AP of LVIS rare classes from 31.2% to 44.6% (43% relative to ), where the model sees no human box annotations. OWL-ST unlocks web-scale training for open world localization, similar to image classification and language modeling.

1.7 Fusing Structural and Functional Connectivities using Disentangled VAE for Detecting MCI

Disentangling VAE-fused structural and functional connectivity assays for MCI

Paper address:

https://arxiv.org/abs/2306.09629

insert image description here
Brain network analysis is a useful method for studying human brain disorders because it can distinguish sick from healthy people by detecting abnormal connections. Due to the complementary information of multimodal neural images, multimodal fusion techniques have great potential to improve predictive performance. However, efficiently fusing multimodal medical images for complementarity remains a challenging problem. This paper presents a novel hierarchical structure-functional connectivity fusion (HSCF) model for constructing brain structure-functional connectivity matrix and predicting abnormal brain connectivity based on functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) . Specifically, prior knowledge is incorporated into the separator for disentangling each modality of information via a graph convolutional network (GCN). To ensure the effectiveness of unwrapping, the design understands the wrapping cosine distance loss. Furthermore, the hierarchical representation fusion module is designed to effectively maximize the correlation between modalities and the combination of effective features, which makes the generated structure-function connectivity more robust and discriminative in cognitive disease analysis. Extensive testing results from the public Alzheimer's Disease Neuroimaging Initiative (ADNI) database demonstrate that the proposed model outperforms competing methods in terms of classification evaluation. In general, the proposed HSCF model is a promising model for generating the brain's structural-functional connectivity and identifying abnormal brain connectivity in the progression of cognitive disease.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/131313563