[Computer Vision | Transformer | Medical Related] arxiv Computer Vision Academic Express on Transformer and Medical Related (Collection of Papers on September 21)

1. Transformer (2 articles)

1.1 Automatic Bat Call Classification using Transformer Networks

Automatic classification of bat calls based on Transformer network

https://arxiv.org/abs/2309.11218

Insert image description here
Automatic identification of bat species from their echolocation calls is a difficult but important task in monitoring bats and the ecosystems in which they live. The main challenges in automated bat call identification are high call variability, similarities between species, interfering calls and lack of annotated data. Many currently available models have relatively poor performance on real data due to being trained on single-call datasets and, furthermore, are often too slow for real-time classification. Here, we propose a Transformer architecture for multi-label classification with potential applications in the context of real-time classification. We trained our model on synthetically generated multispecies records by merging multiple bat calls into a single record with multiple simultaneous calls. Our method achieves a single-species accuracy of 88.92% (F1 score of 84.23%) and a multi-species macro F1 score of 74.40% on our test set. Compared with the other three tools on the independent and public dataset ChiroVox, our model achieves at least 25.82% improvement in accuracy for single species classification and at least 6.9% improvement in macro F1 score for multi-species classification.

1.2 Forgery-aware Adaptive Vision Transformer for Face Forgery Detection

Application of forgery-aware adaptive vision transformer to face forgery detection

https://arxiv.org/abs/2309.11092

Insert image description here
With the development of face tampering technology, the importance of face forgery detection in protecting authentication integrity has become increasingly prominent. Previous Vision Transformer (ViT)-based detectors have shown subpar performance in cross-database evaluations, mainly because full fine-tuning with limited deepfake data often results in forgetting pre-trained knowledge and overfitting to specific data. To circumvent these problems, we propose a new forgery-aware adaptive Vision Transformer (FA-ViT). In FA-ViT, the parameters of vanilla ViT are frozen to preserve its pre-trained knowledge, while two specially designed components, the local-aware forgery injector (LFI) and the global-aware forgery adapter (GFA), are used to adapt to forgery relevant knowledge. Our proposed FA-ViT effectively combines these two different types of knowledge to form a universal forgery feature for detecting deepfakes. Specifically, LFI captures local discriminative information and incorporates this information into ViT via Neighborhood Preserving Cross Attention (NPCA). At the same time, GFA learns adaptive knowledge in the self-attention layer, bridging the gap between two different fields. Furthermore, we design a new single-domain pairwise learning (SDPL) to facilitate fine-grained information learning in FA-ViT. Extensive experiments show that our FA-ViT achieves state-of-the-art performance in cross-dataset evaluation and cross-operation scenarios, and improves robustness to unseen perturbations.

2. Medical related (2 articles)

2.1 Uncovering the effects of model initialization on deep model generalization: A study with adult and pediatric Chest X-ray images

Uncovering the impact of model initialization on deep model generalization: a study on adult and pediatric chest X-ray images

https://arxiv.org/abs/2309.11318

Insert image description here
Model initialization techniques are critical to improving the performance and reliability of deep learning models in medical computer vision applications. While much literature exists on non-medical images, less is known about the impact on medical images, especially chest X-rays (CXR). Addressing this gap, our study explores three deep model initialization techniques: cold start, warm start, contraction, and perturbation start, focusing on adult and pediatric populations. We pay particular attention to scenarios where data arrives regularly for training, thereby embracing real-world scenarios with ongoing data influx and the need for model updates. We evaluated the generalizability of these models to external adult and pediatric CXR datasets. We also propose new ensemble methods: F-score weighted sequential least squares quadratic programming (F-SLSQP) and attention-guided ensemble with learnable fuzzy softmax to aggregate weight parameters from multiple models to exploit them collective knowledge and complementary representations. We performed statistical significance tests using 95% confidence intervals and p-values ​​to analyze model performance. Our evaluation shows that models initialized with ImageNet pretrained weights exhibit superiority over randomly initialized counterparts, contradicting some findings for non-medical images. Notably, the ImageNet pre-trained model showed consistent performance during internal and external testing of different training scenarios. Weight-level ensembles of these models showed significantly higher recall during testing compared to individual models (p<0.05). Our study therefore highlights the benefits of ImageNet pretrained weight initialization, especially when used with weight-level ensembles, for creating robust and generalizable deep learning solutions.

2.2 CMRxRecon: An open cardiac MRI dataset for the competition of accelerated image reconstruction

CMRxRecon: an open cardiac MRI dataset for accelerated image reconstruction

https://arxiv.org/abs/2309.10836

Insert image description here
Cardiac magnetic resonance imaging (CMR) has become a valuable diagnostic tool for heart disease. However, a limitation of CMR is its slow imaging speed, which causes patient discomfort and introduces artifacts in the images. There is growing interest in deep learning-based CMR imaging algorithms that can reconstruct high-quality images from highly undersampled k-space data. However, the development of deep learning methods requires large training datasets that are not yet publicly available for CMR. To address this gap, we release a dataset that includes multi-contrast, multi-view, multi-slice and multi-coil CMR imaging data from 300 subjects. Imaging studies include cardiac movies and mapping sequences. Manual segmentation of myocardium and chambers for all subjects is also provided in the dataset. State-of-the-art reconstruction algorithm scripts are also provided as a reference. Our goal is to facilitate the advancement of state-of-the-art CMR image reconstruction by introducing standardized evaluation criteria and making the dataset freely accessible to the research community. Researchers can access the dataset at https://www.synapse.org/#! Synapse:syn51471091/wiki/.

Guess you like

Origin blog.csdn.net/wzk4869/article/details/133137016