ECCV 2022 | MVDG: A Unified Multi-View Framework for Domain Generalization

The foreword  paper proposes a new multi-view distributed object framework to effectively reduce overfitting in training and testing phases.
Specifically, during the training phase, a multi-view regularized meta-learning algorithm is developed to exploit multiple optimization trajectories to generate optimization directions suitable for model updates. In the test phase, multiple enhanced images are used for multi-view prediction, which alleviates the problem of unstable prediction and significantly improves the reliability of the model.
Extensive experiments on three benchmark datasets verify that our method can find a flat minimum to enhance generalization and outperform several state-of-the-art methods.

Welcome to pay attention to the public account CV technical guide , focusing on computer vision technical summary, latest technology tracking, interpretation of classic papers, CV recruitment information. At present, the official account is in the process of soliciting manuscripts , and you can get the corresponding manuscript fee.

QQ communication group: 444129970. There are big guys in the group who are responsible for answering everyone's daily study, scientific research, and code questions.

Paper: https://arxiv.org/pdf/2112.12329.pdf

Code: https://github.com/koncle/MVDG

innovative ideas

Traditional supervised learning assumes that the training data and test data come from the same distribution. However, this traditional assumption is not always satisfied in the real world when there is a domain shift between training and testing data. Recently, learning a robust and effective model against domain shift has attracted considerable attention.

Unsupervised domain adaptation (UDA) is one of the most representative learning paradigms under domain transformation, which aims to solve the adaptation problem from labeled source domain to unlabeled target domain under domain transformation.

Although the current UDA model has achieved great success, when deploying the previously trained UDA model to other unseen areas, it is necessary to retrain the model with newly collected data from unseen areas. This retraining process not only adds additional space/time cost, but also violates privacy policy in some cases (e.g., clinical data), making these UDA methods unsuitable for some practical tasks.

In DG, it only needs to learn relevant knowledge from the existing source domain, and the trained model can be directly applied to the domain that has not been seen before, without further training. In order to ensure the effectiveness of the model on the unseen target domain, previous DG methods reduce the influence of the source domain on specific domains by learning domain-invariant representations, but there is inevitably the problem of overfitting.

As a result, meta-learning based methods have become one of the most popular approaches to resist overfitting in training, which incidentally mimic domain shifts for regularization. However, these methods use only one task per iteration to train their models, which may lead to biased and noisy optimization directions.

In addition, after studying the predictions of the trained model during the test phase, overfitting can also lead to unstable predictions. The authors conduct experiments by perturbing (e.g., randomly cropping and flipping) the test images. As shown in Figure 1, predictions often change after being perturbed.

Figure 1: Example of changes in predicted probabilities when test images are slightly perturbed in unseen regions on the PACS dataset.

As mentioned above, the overfitting problem not only occurs in the training phase, but also largely affects the testing process. To prevent overfitting, this paper proposes a multi-view framework to deal with poor generalization and unstable predictions.

The main contribution of this paper

1. During training, an effective multi-view regularization meta-learning scheme is designed to prevent overfitting and find a flat minimum for better generalization.

2. It is theoretically proved that increasing the number of tasks in the training phase can make the generalization gap smaller and the generalization better.

3. In the testing phase, a multi-view prediction strategy is introduced to improve the reliability of predictions by utilizing multi-view information.

4. Our method has been extensively experimented on multiple DG benchmark datasets and outperforms other SOTA methods.

method

Situational Training Framework

Assuming that the data space and label space are X and Y respectively, and the N source domains are D1,..., DN, the model parameterized by θ is denoted as f.

Given an input x, the model outputs f(x|θ), taking one domain as the meta-test domain Dte and the rest as the meta-sequence domain Dtr. A mini-batch of samples is then drawn from these domains to obtain meta-sequence Btr and meta-test data Bte. The parameter θ defines the loss on B:

Unlike previous meta-learning algorithms that use second-order gradients to update model parameters, Reptile is a first-order algorithm to reduce computational costs. Therefore, the Reptile version of MLDG is adopted. At the jth iteration, given the model parameters θj, first use L(Btr|θj), and then use L(Bte|θj) to train the model. Then, a temporary update parameter θtmp along the optimized trajectory is obtained.

Finally, with (θtmp−θj) as the optimization direction, update the original parameters θj:

In this way, θtmp can use a part of the current weight space to find a better optimization direction for the current sampling task.

Multi-view regularization meta-learning

While Reptile reduces computational cost, the above training scheme suffers from several problems.

1. The model is trained along a single optimization trajectory, only generating a temporary parameter θtmp.

2. Since the model is trained with a single task in each trajectory, it cannot fully explore the weight space, and it is difficult to escape local minima, leading to overfitting problems.

To better explore the weight space for more precise optimization directions and eliminate the effects of overfitting, a simple yet effective multi-view regularized meta-learning (MVRML) is designed by exploiting multi-view information in each iteration. algorithm.

Specifically, in order to find robust optimization directions, T temporary parameters {θ1tmp,...,θTtmp} will be obtained along different optimization trajectories. Unlike MLDG, each temporal parameter is trained with s sampling tasks to help it break away from local minima.

Furthermore, different tasks are sampled in different trajectories to encourage diversity in the temporal model. Learning from these tasks exploits complementary views to explore different information in the weight space. The complete algorithm is shown in Algorithm 1, as shown in Figure 2.

Figure 2: Illustration of Reptile and multi-view regularization meta-learning algorithms.

multi-view prediction

Since our model is trained in the source domain, the feature representations of the learned data are well clustered. However, when unseen images are presented, these images are more likely to be close to the decision boundary due to overfitting and domain differences, resulting in unstable feature representations. When small perturbations are applied to the test images, their feature representations are pushed beyond the bounds, as shown in Figure 1.

Therefore, the author uses multi-view prediction (MVP) instead of a single view for testing. Through multi-view prediction, the complementary information of these views is integrated to obtain more robust and reliable prediction results.

Suppose there is an image x to test, different views of this image are generated by some weak random transformation T( ). Then the image prediction p is given by:

Only weak transformations (such as random flipping) are applied to MVP, because strong enhancements (such as color dithering) will make the enhanced image deviate from the manifold of the original image, resulting in unsatisfactory prediction accuracy.

experiment

Table 1: Domain generalization accuracy (%) on the PACS dataset using ResNet-18 (left) and ResNet-50 (right).

Table 3: Accuracy (%) of each component of DeepAll, Reptile and Multi-View Regularized Meta-Learning (MVRML), Multi-View Prediction (MVP) in ablation experiments on PACS dataset.

Figure 4: Comparison of local sharpness of ERM, Reptile, and MVRML in PACS target regions.

Table 4: Comparison of different task sampling strategies for MVRML on the PACS dataset.

Figure 5: Effect of number of tasks (a) and number of trajectories (b) in MVRML.

Table 5: The left table shows the accuracy and predicted rate of change (PCR) of different methods of ResNet-18 on the PACS dataset. The table on the right shows the accuracy (%) of applying MVP to other SOTA methods on the PACS dataset.

in conclusion

In order to resist overfitting, the DG model based on task augmentation training and sample augmentation testing can both improve the performance of the model, this paper proposes a new multi-view framework to improve generalization and reduce unstable predictions caused by overfitting .

During training, a multi-view regularization meta-learning algorithm is designed. During testing, multi-view prediction is introduced to generate different views of a single image to stabilize the integrated predictions. The effectiveness of our method is verified through extensive experiments on three DG benchmark datasets.


I set up a QQ communication group, intending to expand the scale to 5,000 people, and I also specially hired a boss to maintain the communication atmosphere in the group. If you have any questions, you can ask them directly. It is mainly used for algorithms, technology, learning, work, job hunting, etc. Communication, solicitation of papers, public account or planetary recruitment, and some benefits will also be given priority to the group. If you are interested, please search the group number: 444129970

The way of adding WeChat group and knowledge planet: follow the public account CV technical guide, get editor WeChat, and invite to join.

Welcome to pay attention to the public account CV technical guide , focusing on computer vision technical summary, latest technology tracking, interpretation of classic papers, CV recruitment information. At present, the official account is in the process of soliciting manuscripts , and you can get the corresponding manuscript fee.

other articles

ECCV 2022 | MorphMLP: An Efficient MLP-like Architecture for Video Modeling

CVPR 2022 | BatchFormerV2: New plug-and-play module for learning sample relations

CVPR 2022|RINet: Weakly Supervised Rotation Invariant Aerial Target Detection Network

ECCV 2022 | New scheme: first pruning and then distillation

ECCV 2022 | FPN:You Should Look at All Objects

ECCV 2022 | ScalableViT: Rethinking Visual Transformer Context-Oriented Generalization

ECCV 2022 | RFLA: Label Assignment for Tiny Object Detection Based on Gaussian Receptive Field

Pytorch to onnx detailed explanation

Pytorch accelerates data reading

Summary of calculation and parameter estimation of various neural network layers and modules

Migration Technology-Industrial Robot 3D Vision Direction 2023 School Recruitment-C++, Algorithm, Scheme and other positions

Book at the end of the article | [Experience] From basic overview, paper notes to engineering experience, training skills in deep learning

ECCV 2022 | Towards Data Efficient Transformer Object Detectors

ECCV 2022 Oral | Registration-Based Few-Shot Anomaly Detection Framework

CVPR 2022 | An In-Depth Study of Batch Normalization Estimation Offset in Networks

CVPR2022 | Fusion of Self-Attention and Convolution

CVPR2022 | Re-examine pooling: Your receptive field is not ideal

CVPR2022 | A ConvNet for the 2020s & How to Design a Neural Network Summary

Summary of common words in papers in computer vision

Summary of methods for efficiently reading papers in computer vision

Guess you like

Origin blog.csdn.net/KANG157/article/details/126657969