Domain Adaptation and Graph Neural Networks

Domain Adaptation and Graph Neural Networks

Student : Wenxuan Zeng

School : University of Electronic Science and Technology of China

Date : 2022.4.21 - 2022.4.23


1 Domain Adaptation

1.1 Background

First, consider the following scenario: We have trained the model on the training set, and now we are testing it on the testing set. If the testing data and training data come from the same distribution, then the prediction result is good, but when the gap between the testing data and the training data is large, the prediction result will be poor. As shown in the figure below, this phenomenon is called "Domain shift", which means that training data and testing data have different distributions.

insert image description here

This phenomenon widely occurs in the field of machine learning. When everyone is excited about the amazing results achieved by machine learning, they do not know that when the model is deployed in the actual environment, the quality of the results is unknown. Then the way to solve this problem is Domain adaptation, which is also a type of Transfer learning (transfer the knowledge trained on domain A to domain B).

1.2 Domain shift

There are mainly three different forms of domain shift: one is that the distribution of training data and testing data is different, the other is that the distribution of output labels is different, and the other is that the relationship between data and labels is different.

insert image description here

The main content of this study is for domain shift with different data distribution. We define training data as source domain and testing data as target domain.

1.3 Domain Adaptation

When doing domain adaptation, you need to take into account the knowledge of the target domain.

① When there is a large amount of labeled data on the target domain , then there is no need to do domain adaptation, because we can directly train on the target domain.

② When there is a small amount of labeled data on the target domain , you can use the source data to train a model, and then do fine-tune on the target data. The problem is that it is easy to overfitting (turn down the learning rate, turn down the epoch...)

③ When there is a large amount of unlabeled data on the target domain , it is the most common situation now. Start learning how to fix this situation now. A relatively basic idea is to use a feature extractor to extract image features. Although the source data and target data are distributed differently, the proposed features are identically distributed. The advantage of this is that it can smoothly adapt to different fields. .

insert image description here

So how to design such a feature extractor? As shown in the figure below, the depth representation of the image can be extracted through the feature extractor (feature extractor can be artificially defined as 3 layers, 4 layers, 5 layers...), we require that the features obtained by source data and target data are indistinguishable (or are identically distributed).

insert image description here

Now consider how to get a feature extractor that can output the same distribution, which can be realized by domain adversarial training technology. After the feature layer is output, a domain classifier (two classifier) ​​is used to judge whether the source of the image is source data or target data. If the feature generated by the feature extractor can successfully fool the domain classifier, it means that the above idea has been realized.

Domain-Adversarial Training of Neural Networks

insert image description here

Simplify the above diagram:

insert image description here

This is easily compared to GAN, the feature extractor is regarded as a generator, and the domain classifier is regarded as a discriminator. In fact, the text of domain adversarial training and GAN are works of the same period (2015), which have the same effect.

Just imagine, will the feature extractor be too powerful, because if it outputs all zero vectors, it can also fool the domain classifier! Will there be such a situation? Actually not, because the label predictor needs this feature to make predictions.

But doing so may not be very good. The domain classifier needs to minimize the loss of the two classifications, and the feature extractor needs to maximize the loss if it wants to deceive the domain classifier, so this is not the best.

I just learned that to make the target distribution as close as possible to the source, there will be the following two situations, and what we expect is the second situation. A more intuitive idea is to make the obtained feature as far away from the decision boundary as possible.

insert image description here

One approach is to input the unlabeled data in the target into the feature extractor and into the label predictor, and finally get the prediction result. If the predicted distribution is very concentrated, it represents small entropy, otherwise it is large entropy.

insert image description here

Of course there are more work on decision boundary:

Think about another problem. If the categories in the source domain and the categories in the target domain are not exactly the same, there will be problems with the above method. For example, if there are lion animal categories that have not appeared in the source domain in the target domain, we still make the feature and source in the target conform to the same distribution. Isn’t that just saying that the lion is another animal? Obviously unreasonable! Then such an article appeared on CVPR2019: Universal Domain Adaptation

insert image description here

④ When there is a small amount of unlabeled data on the target domain , there is also a solution. Refer to: Test-Time Training with Self-Supervision for Generalization under Distribution Shifts

⑤ When the target domain is completely unknown , this is not called domain adaptation task, but domain generalization . The goal is to hope that the model can generalize well in unknown scenes. Reference: Domain Generalization with Adversarial Feature Learning . Another situation is that there is only a small amount of data on the source domain, but it needs to be applied to a variety of different target domains. Refer to: Learning to Learn Single Domain Generalization

2 Domain Adaptation in Computer Vision

In the field of computer vision, domain adaptation is widely used, such as autonomous driving. There may be a large gap between the pictures on the data set and the real situation. How to do domain adaptation is a very important part. For example, in the picture below, the actual scene may be in heavy fog, rainy, heavy snow and other weather. In such an environment, can the tasks of target detection, semantic segmentation, and object classification be well achieved?

insert image description here

Here are a few ways to implement domain adaptation:

  • Data Generation and Continuous Domains

  • Augmenting datasets can solve such problems, but collecting (and labeling) a large amount of data in various environments is expensive and usually not feasible. Therefore, the usual method is to simulate the data of various conditions and generate a large amount of data to expand the data set.

    Generate foggy scenes with different visibility through depth estimation: Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding

insert image description here

Use statistical models (such as CycleGAN) to generate nighttime scenes: Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation

insert image description here

In this paper, domain flow is also used to implement the migration process of continuous domains:

insert image description here

Establish a map relationship between the generated image and the original image:

insert image description here

If the data cannot be collected, use the generated method: DLOW: Domain Flow and Applications

insert image description here

  • Test-time domain adaptation, allowing the model to learn and adapt itself in actual scenarios

  • Domain Adaptation for Multi-Task Learning

insert image description here

Aiding Semantic Segmentation Tasks with Depth Estimation from Self-Supervised Learning: Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation . The labeling cost of semantic segmentation is very high, so a self-supervised depth estimation task is used to assist semantic segmentation.

insert image description here

Domain Adaptive Semantic Segmentation with Self-Supervised Depth EstimationThis article assists the completion of semantic segmentation tasks through self-supervised depth estimation. The correlation between tasks in the source domain can be transferred to the target domain, and the difficulty of depth estimation in the source domain and target domain can be transferred to semantic analysis.

insert image description here

For the situation where there are multiple mixed target domains, there are also corresponding articles: Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation

insert image description here

  • Multi-Sensor Fusion

    I won’t talk about it here, but focus on the device aspect.

3 Domain Adaptation in Graph Neural Networks (Understanding of Four Important Papers)

3.1 DANE: Domain Adaptive Network Embedding (IJCAI '19)

Link: DANE: Domain Adaptive Network Embedding

  • Features: Embedding domain adaptation, graph neural network, confrontation learning

  • Motivation: Previous work only considers learning embedding for a single network and cannot be transferred to multiple networks. So the author wants to design an embedding algorithm to support the migration of downstream tasks to different networks for training. This is domain adaptation.

  • Contributions: ① The earliest proposed domain adaptation on cross multiple graph networks; ② Two alignment methods: feature space alignment - structurally similar nodes have similar representation vectors, even if they come from different networks, the method is that the node representation is represented by two The network of parameter sharing is obtained; the distribution is aligned - the method is that the node representation distribution is regularized by confrontation learning; ③ two data sets are constructed.

  • Method: Divided into the following two parts:

    • GCN network with shared weights

      Using two identical GCN networks, respectively get embedding:
      insert image description here

      In order to learn parameters compatible with both the source domain and the target domain, a multi-task loss function is used to preserve the properties of the two networks:

      insert image description here

    • Adversarial Learning Regularization

      Similar to the GAN idea, train a discriminator to identify which network (source/target domain) the embedding comes from, and train a weight-sharing GCN as a generator. For the discriminator, there are the following losses:

      insert image description here

      In order to achieve two-way domain adaptation, the following confrontation training loss is designed:

      insert image description here

      Finally combine the losses in the two steps:

      insert image description here

    insert image description here

3.2 GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation (CVPR '19)

链接:GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation

  • Features: graph neural network, unsupervised learning, domain adaptation, confrontation network

  • Motivation: The author believes that to build a bridge between the source domain and the target domain in domain adaptation, there are three most important information: data structure (reflecting the inherent properties of the data set, including marginal or conditional data distribution, data statistics, geometric data structure, etc.), domain labels (used in adversarial domain adaptation methods, and can help train a domain classifier to model the global distribution of source and target domains), category labels (especially target pseudo-labels, usually with forced Semantic alignment, which guarantees that samples from different domains with the same class label will be mapped close to the feature space).

    insert image description here

  • Contributions: ① Joint modeling of three kinds of information (data structure, domain label, category label) in unsupervised domain adaptation for the first time; ② In order to more robustly match the distribution of source domain and target domain, three alignment mechanisms are proposed (Structure-Aware Alignment, Domain Alignment, Class Center Alignment) Efficiently learn domain-invariant and semantic representations to reduce domain variance for domain adaptation.

  • method:

    insert image description here

    • Optimized objective function

      insert image description here

    • domain alignment

      A domain classifier is added (the extracted feature comes from the source domain/target domain binary classification), and the domain confrontation similarity is used as the loss function:

      insert image description here

      The Feature extractor is used to deceive the classifier, which is used to discriminate the source, and helps train a domain classifier to model the global distribution of the source domain and the target domain in an adversarial manner.

    • Structure Aware Alignment

      Domain alignment enforces the alignment of global domain information, but ignores the structural information of small samples. First, a data structure analyzer (DSA) network is used to generate a structure score for a mini-batch sample, and then, a densely connected instance graph is constructed using the obtained structure score and the learned sample CNN features. Afterwards, GCN operates on the instance graph to learn GCN features encoded with data structure information. Let's learn how to build a densely connected example graph:

      The information of the graph is expressed as a feature extracted by CNN:

      insert image description here

      To construct the adjacency matrix, this mini-batch sample is fed into the Data Structure Analyzer (DSA) network to generate a structure score, with which the adjacency matrix can be constructed:

      insert image description here

      Use Triplet loss to limit the generation of structural scores:

      insert image description here

    • Category Label Alignment

      Domain invariance (Domain Invariance) and structure consistency (Structure Consistency) of features do not mean that they are discriminative (Discriminability), so the class label center alignment is proposed.

      First, a target classifier is used to assign pseudo-labels to obtain pseudo-labeled data on the target domain. Then, the labeled samples are used together with the pseudo-labeled samples to compute the centroid of each class.

      insert image description here

3.3 Unsupervised Domain Adaptive Graph Convolutional Networks (WWW '20)

链接:Unsupervised Domain Adaptive Graph Convolutional Networks

  • Features: Unsupervised learning, graph neural network, domain adaptation

  • Motivation: Most GCNs can only work in a single domain (graph) and cannot transfer knowledge from other domains (graphs), due to challenges in graph representation learning and domain adaptation on graph structures.

    insert image description here

  • Contributions: ① proposed a new unsupervised graph domain adaptation problem, and proposed a dual graph convolutional network algorithm; ② combined local and global consistency with attention mechanism to learn effective node embedding across networks ; ③ Using source and target information with different loss functions, domain invariant representation and semantic representation can be effectively learned, thereby reducing domain differences in cross-domain node classification.

  • **Methods: ** To achieve efficient graph representation learning, a dual graph convolutional network component is first developed, which jointly exploits local and global consistency for feature aggregation. An attention mechanism is further used to generate a unified representation for each node in different graphs. To facilitate knowledge transfer between graphs, a domain-adaptive learning module is proposed to optimize three different loss functions (source classifier loss, domain classifier loss, target classifier loss), thus, the model can distinguish the source domain separately Class labels in , samples in different domains, class labels in the target domain.

    insert image description here

    • node representation learning

      Capturing the local and global consistency relations of each graph with a dual graph convolutional network. For local consistency, the adjacency matrix A is directly input to GCN; for global consistency, another convolution method based on random walk is proposed.

      Local consistency: simple GCN, inputs A and X

      insert image description here

      Global Consistency: The convolutional method of PPMI (point-level mutual information matrix) is introduced to encode global information. First calculate the frequency matrix F by random walk (random walk can calculate the semantic similarity between nodes)

      insert image description here

      It can be seen that the difference between global and local consistency is that the "adjacency matrix" of the input GCN is different.

    • Attention between pictures

      We need to aggregate embeddings from different graphs to generate a unified representation, automatically determining the weights between source and target graph representations from local and global GCN layers, respectively.

      insert image description here

    • Domain Adaptive Learning in Cross-Domain Node Classification

      Three classifiers are trained: ① source classifier; ② domain classifier; ③ target classifier

      General training goals:

      insert image description here

      The source classifier loss is the cross entropy loss function:

      insert image description here

      Like the previous work, the domain classifier still uses the idea of ​​adversarial learning. It is necessary to let the network distinguish between two sources, and to make the features generated by the two sources similar, making it difficult for the classifier to judge.

      insert image description here

      The target classification loss function uses entropy loss (cross-entropy is not acceptable because the target domain has no label):

      insert image description here

3.4 Adaptive Trajectory Prediction via Transferable GNN

链接:Adaptive Trajectory Prediction via Transferable GNN

  • Features: Graph Neural Network, Pedestrian Trajectory Prediction, Domain Adaptation

  • Motivation: Most of the existing work does not consider that the motion of the training set and the test set follow the same pattern, ignoring the underlying distribution differences. Therefore, a framework for domain alignment is proposed to achieve domain adaptation.

  • Contributions: ① In-depth study of the domain migration problem in different trajectory domains, and a unified T-GNN method for joint prediction of future trajectories and adaptive learning of domain-invariant knowledge; ② A specially designed graph neural network is proposed to extract comprehensive We also develop an effective attention-based adaptive knowledge learning module to explore fine-grained individual-level transferable feature representations; ③ introduce a new pedestrian trajectory prediction problem setting, where The species-domain transfer setting establishes a strong baseline for pedestrian trajectory prediction.

  • Methods: A domain-invariant GNN is proposed to explore structural motion knowledge with domain-specific knowledge reduction, and an attention-based adaptive knowledge learning module is also proposed to explore fine-grained individual-level feature representations for knowledge transfer.

    insert image description here

    • spatio-temporal feature representation

      In simple terms, it is a three-layer GCN network:

      insert image description here

    • Attention-Based Adaptive Learning

      The purpose of this part is to address the gap between the source and target domains. Although the eigenvector preserves the spatio-temporal information of a pedestrian, we cannot be sure how representative the eigenvector of that pedestrian is in a domain. Therefore, an attention module is introduced to learn the relative correlation between feature vectors and trajectory domains.

      insert image description here

    • Time Prediction Module

      To put it simply, the temporal convolution TCN is used to predict the time dimension:

      insert image description here

    • training target

      For trajectory prediction, use the negative log-likelihood:

      insert image description here

      The final training objective combines the previous alignment loss and prediction loss:

      insert image description here

3.5 Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation (CVPR '21)

  • Features: single source domain-multiple target source domains, course learning, pseudo-label generation

  • Motivation: Solve the problem of multi-domain transfer.

  • Contributions: ① Proposed MTDA's Curriculum Graph Collaborative Teaching (CGCT), which utilizes a collaborative teaching strategy and curriculum learning approach with dual classifier heads to learn more robust representations across multiple target domains; ② In order to better Using domain labels, a domain-aware curriculum learning (DCL) strategy is proposed to smooth the feature alignment process

  • Methods: Two perspectives alleviate the problem of multi-domain transfer, feature aggregation and curriculum learning. A curriculum map collaborative learning is proposed, using dual classifier heads, one of which is GCN that aggregates features from similar samples across domains. Prevents the classifier from overfitting to its own noisy pseudo-labels. To prevent classifiers from overfitting to their own noisy pseudo-labels, develop a co-teaching strategy with dual classifier heads, supplemented by curriculum learning, to obtain more reliable pseudo-labels. Additionally, when domain labels are available, domain-aware curriculum learning (DCL) is proposed, which is a sequential adaptation strategy that first adapts to easier target domains and then to harder target domains.

    insert image description here

    • (a) Curriculum Graph Co-Teaching

      STEP 1: Domain Adaptation

      f e d g e f_{edge} fedgeTo generate an adjacency matrix, the supervision information is given by MLP, and MLP labels the edges between nodes (the labels of two nodes are consistent, then the similarity between them is 1, otherwise it is 0):

      insert image description here

      Then generate the loss of the adjacency matrix:

      insert image description here

      The loss of GCN and MLP on the source domain:

      insert image description here

      The final optimization goal is:

      insert image description here

      STEP 2: Pseudo-label annotation

      Use GCN to mark the unlabeled data, and if it is less than a certain threshold, it will not participate in the training. Why choose the output of GCN for labeling? The author said that considering the aggregation of GCN features, it is more robust than MLP. Then the data becomes:

      insert image description here

    • (b) Domain-aware Curriculum Learning

      The authors consider the case where the target domain is labeled. The shift degree of data distribution in different target domains and source domains is different, so the difficulty of self-adaptation is different. The Easy-to-Hard Domain Selection (EHDS) strategy is adopted here, first adapting to the easy domain, and then adapting to the hard domain. The reason is that it is obviously easier for the model to adapt to a domain with a smaller shift than a domain with a larger shift.

      How to measure which domain is easier? The author measures this indicator with information entropy:

      insert image description here

      STEP 1: field selection

      Choose a relatively simple domain based on the above information entropy:

      insert image description here

      STEP 2: Domain Adaptation

      It is the same as Curriculum Graph Co-Teaching, but the data of the target domain adopts the selected domain data.

      STEP 3: Pseudo-label marking

      It is the same as Curriculum Graph Co-Teaching, but the data of the target domain adopts the selected domain data.

4 Conclusion

  • In the previous three works, the first one is learning a transferable Embedding representation method, the second, third and fourth (pedestrian trajectory prediction), and the fifth (multi-domain adaptation) are all model Adaptively migrate to different domains. In addition, GNN has more applications in the field of Domain adaptation, such as: Graph-Relational Domain Adaptation (ICLR '22') believes that the relationship between different domains is not equivalent, and there is a topological relationship between them, so Think about the relationship between domains from the perspective of a graph. Unsupervised Multi-Source Domain Adaptation for Person Re-Identification (CVPR '21) This article introduces multi-source domain adaptation to the problem of person re-identification.
  • I think Domain adaptation is a very promising direction, because in most practical scenarios, the data is different from the training, or even completely unknown. Then, how to maximize the value of the model in actual scenarios is a problem that needs to be continuously solved for domain adaptation.
  • AutoEval is also solving the problem of unknown test scene data, but consider the problem from another point of entry, and use "estimation" to evaluate the effect of the model in an unknown environment, which belongs to a kind of "evaluation of self-learning ability" learning paradigm.

Guess you like

Origin blog.csdn.net/qq_16763983/article/details/125156541