Domain Adaptation and Graph Neural Networks
Student : Wenxuan Zeng
School : University of Electronic Science and Technology of China
Date : 2022.4.21 - 2022.4.23
Article directory
- Domain Adaptation and Graph Neural Networks
-
- 1 Domain Adaptation
- 2 Domain Adaptation in Computer Vision
- 3 Domain Adaptation in Graph Neural Networks (Understanding of Four Important Papers)
-
- 3.1 DANE: Domain Adaptive Network Embedding (IJCAI '19)
- 3.2 GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation (CVPR '19)
- 3.3 Unsupervised Domain Adaptive Graph Convolutional Networks (WWW '20)
- 3.4 Adaptive Trajectory Prediction via Transferable GNN
- 3.5 Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation (CVPR '21)
- 4 Conclusion
1 Domain Adaptation
1.1 Background
First, consider the following scenario: We have trained the model on the training set, and now we are testing it on the testing set. If the testing data and training data come from the same distribution, then the prediction result is good, but when the gap between the testing data and the training data is large, the prediction result will be poor. As shown in the figure below, this phenomenon is called "Domain shift", which means that training data and testing data have different distributions.
This phenomenon widely occurs in the field of machine learning. When everyone is excited about the amazing results achieved by machine learning, they do not know that when the model is deployed in the actual environment, the quality of the results is unknown. Then the way to solve this problem is Domain adaptation, which is also a type of Transfer learning (transfer the knowledge trained on domain A to domain B).
1.2 Domain shift
There are mainly three different forms of domain shift: one is that the distribution of training data and testing data is different, the other is that the distribution of output labels is different, and the other is that the relationship between data and labels is different.
The main content of this study is for domain shift with different data distribution. We define training data as source domain and testing data as target domain.
1.3 Domain Adaptation
When doing domain adaptation, you need to take into account the knowledge of the target domain.
① When there is a large amount of labeled data on the target domain , then there is no need to do domain adaptation, because we can directly train on the target domain.
② When there is a small amount of labeled data on the target domain , you can use the source data to train a model, and then do fine-tune on the target data. The problem is that it is easy to overfitting (turn down the learning rate, turn down the epoch...)
③ When there is a large amount of unlabeled data on the target domain , it is the most common situation now. Start learning how to fix this situation now. A relatively basic idea is to use a feature extractor to extract image features. Although the source data and target data are distributed differently, the proposed features are identically distributed. The advantage of this is that it can smoothly adapt to different fields. .
So how to design such a feature extractor? As shown in the figure below, the depth representation of the image can be extracted through the feature extractor (feature extractor can be artificially defined as 3 layers, 4 layers, 5 layers...), we require that the features obtained by source data and target data are indistinguishable (or are identically distributed).
Now consider how to get a feature extractor that can output the same distribution, which can be realized by domain adversarial training technology. After the feature layer is output, a domain classifier (two classifier) is used to judge whether the source of the image is source data or target data. If the feature generated by the feature extractor can successfully fool the domain classifier, it means that the above idea has been realized.
Domain-Adversarial Training of Neural Networks
Simplify the above diagram:
This is easily compared to GAN, the feature extractor is regarded as a generator, and the domain classifier is regarded as a discriminator. In fact, the text of domain adversarial training and GAN are works of the same period (2015), which have the same effect.
Just imagine, will the feature extractor be too powerful, because if it outputs all zero vectors, it can also fool the domain classifier! Will there be such a situation? Actually not, because the label predictor needs this feature to make predictions.
But doing so may not be very good. The domain classifier needs to minimize the loss of the two classifications, and the feature extractor needs to maximize the loss if it wants to deceive the domain classifier, so this is not the best.
I just learned that to make the target distribution as close as possible to the source, there will be the following two situations, and what we expect is the second situation. A more intuitive idea is to make the obtained feature as far away from the decision boundary as possible.
One approach is to input the unlabeled data in the target into the feature extractor and into the label predictor, and finally get the prediction result. If the predicted distribution is very concentrated, it represents small entropy, otherwise it is large entropy.
Of course there are more work on decision boundary:
Think about another problem. If the categories in the source domain and the categories in the target domain are not exactly the same, there will be problems with the above method. For example, if there are lion animal categories that have not appeared in the source domain in the target domain, we still make the feature and source in the target conform to the same distribution. Isn’t that just saying that the lion is another animal? Obviously unreasonable! Then such an article appeared on CVPR2019: Universal Domain Adaptation
④ When there is a small amount of unlabeled data on the target domain , there is also a solution. Refer to: Test-Time Training with Self-Supervision for Generalization under Distribution Shifts
⑤ When the target domain is completely unknown , this is not called domain adaptation task, but domain generalization . The goal is to hope that the model can generalize well in unknown scenes. Reference: Domain Generalization with Adversarial Feature Learning . Another situation is that there is only a small amount of data on the source domain, but it needs to be applied to a variety of different target domains. Refer to: Learning to Learn Single Domain Generalization
2 Domain Adaptation in Computer Vision
In the field of computer vision, domain adaptation is widely used, such as autonomous driving. There may be a large gap between the pictures on the data set and the real situation. How to do domain adaptation is a very important part. For example, in the picture below, the actual scene may be in heavy fog, rainy, heavy snow and other weather. In such an environment, can the tasks of target detection, semantic segmentation, and object classification be well achieved?
Here are a few ways to implement domain adaptation:
-
Data Generation and Continuous Domains
-
Augmenting datasets can solve such problems, but collecting (and labeling) a large amount of data in various environments is expensive and usually not feasible. Therefore, the usual method is to simulate the data of various conditions and generate a large amount of data to expand the data set.
Generate foggy scenes with different visibility through depth estimation: Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding
Use statistical models (such as CycleGAN) to generate nighttime scenes: Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
In this paper, domain flow is also used to implement the migration process of continuous domains:
Establish a map relationship between the generated image and the original image:
If the data cannot be collected, use the generated method: DLOW: Domain Flow and Applications
-
Test-time domain adaptation, allowing the model to learn and adapt itself in actual scenarios
-
Domain Adaptation for Multi-Task Learning
Aiding Semantic Segmentation Tasks with Depth Estimation from Self-Supervised Learning: Three Ways To Improve Semantic Segmentation With Self-Supervised Depth Estimation . The labeling cost of semantic segmentation is very high, so a self-supervised depth estimation task is used to assist semantic segmentation.
Domain Adaptive Semantic Segmentation with Self-Supervised Depth EstimationThis article assists the completion of semantic segmentation tasks through self-supervised depth estimation. The correlation between tasks in the source domain can be transferred to the target domain, and the difficulty of depth estimation in the source domain and target domain can be transferred to semantic analysis.
For the situation where there are multiple mixed target domains, there are also corresponding articles: Cluster, Split, Fuse, and Update: Meta-Learning for Open Compound Domain Adaptive Semantic Segmentation
-
Multi-Sensor Fusion
I won’t talk about it here, but focus on the device aspect.
3 Domain Adaptation in Graph Neural Networks (Understanding of Four Important Papers)
3.1 DANE: Domain Adaptive Network Embedding (IJCAI '19)
Link: DANE: Domain Adaptive Network Embedding
-
Features: Embedding domain adaptation, graph neural network, confrontation learning
-
Motivation: Previous work only considers learning embedding for a single network and cannot be transferred to multiple networks. So the author wants to design an embedding algorithm to support the migration of downstream tasks to different networks for training. This is domain adaptation.
-
Contributions: ① The earliest proposed domain adaptation on cross multiple graph networks; ② Two alignment methods: feature space alignment - structurally similar nodes have similar representation vectors, even if they come from different networks, the method is that the node representation is represented by two The network of parameter sharing is obtained; the distribution is aligned - the method is that the node representation distribution is regularized by confrontation learning; ③ two data sets are constructed.
-
Method: Divided into the following two parts:
-
GCN network with shared weights
Using two identical GCN networks, respectively get embedding:
In order to learn parameters compatible with both the source domain and the target domain, a multi-task loss function is used to preserve the properties of the two networks:
-
Adversarial Learning Regularization
Similar to the GAN idea, train a discriminator to identify which network (source/target domain) the embedding comes from, and train a weight-sharing GCN as a generator. For the discriminator, there are the following losses:
In order to achieve two-way domain adaptation, the following confrontation training loss is designed:
Finally combine the losses in the two steps:
-
3.2 GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation (CVPR '19)
链接:GCAN: Graph Convolutional Adversarial Network for Unsupervised Domain Adaptation
-
Features: graph neural network, unsupervised learning, domain adaptation, confrontation network
-
Motivation: The author believes that to build a bridge between the source domain and the target domain in domain adaptation, there are three most important information: data structure (reflecting the inherent properties of the data set, including marginal or conditional data distribution, data statistics, geometric data structure, etc.), domain labels (used in adversarial domain adaptation methods, and can help train a domain classifier to model the global distribution of source and target domains), category labels (especially target pseudo-labels, usually with forced Semantic alignment, which guarantees that samples from different domains with the same class label will be mapped close to the feature space).
-
Contributions: ① Joint modeling of three kinds of information (data structure, domain label, category label) in unsupervised domain adaptation for the first time; ② In order to more robustly match the distribution of source domain and target domain, three alignment mechanisms are proposed (Structure-Aware Alignment, Domain Alignment, Class Center Alignment) Efficiently learn domain-invariant and semantic representations to reduce domain variance for domain adaptation.
-
method:
-
Optimized objective function
-
domain alignment
A domain classifier is added (the extracted feature comes from the source domain/target domain binary classification), and the domain confrontation similarity is used as the loss function:
The Feature extractor is used to deceive the classifier, which is used to discriminate the source, and helps train a domain classifier to model the global distribution of the source domain and the target domain in an adversarial manner.
-
Structure Aware Alignment
Domain alignment enforces the alignment of global domain information, but ignores the structural information of small samples. First, a data structure analyzer (DSA) network is used to generate a structure score for a mini-batch sample, and then, a densely connected instance graph is constructed using the obtained structure score and the learned sample CNN features. Afterwards, GCN operates on the instance graph to learn GCN features encoded with data structure information. Let's learn how to build a densely connected example graph:
The information of the graph is expressed as a feature extracted by CNN:
To construct the adjacency matrix, this mini-batch sample is fed into the Data Structure Analyzer (DSA) network to generate a structure score, with which the adjacency matrix can be constructed:
Use Triplet loss to limit the generation of structural scores:
-
Category Label Alignment
Domain invariance (Domain Invariance) and structure consistency (Structure Consistency) of features do not mean that they are discriminative (Discriminability), so the class label center alignment is proposed.
First, a target classifier is used to assign pseudo-labels to obtain pseudo-labeled data on the target domain. Then, the labeled samples are used together with the pseudo-labeled samples to compute the centroid of each class.
-
3.3 Unsupervised Domain Adaptive Graph Convolutional Networks (WWW '20)
链接:Unsupervised Domain Adaptive Graph Convolutional Networks
-
Features: Unsupervised learning, graph neural network, domain adaptation
-
Motivation: Most GCNs can only work in a single domain (graph) and cannot transfer knowledge from other domains (graphs), due to challenges in graph representation learning and domain adaptation on graph structures.
-
Contributions: ① proposed a new unsupervised graph domain adaptation problem, and proposed a dual graph convolutional network algorithm; ② combined local and global consistency with attention mechanism to learn effective node embedding across networks ; ③ Using source and target information with different loss functions, domain invariant representation and semantic representation can be effectively learned, thereby reducing domain differences in cross-domain node classification.
-
**Methods: ** To achieve efficient graph representation learning, a dual graph convolutional network component is first developed, which jointly exploits local and global consistency for feature aggregation. An attention mechanism is further used to generate a unified representation for each node in different graphs. To facilitate knowledge transfer between graphs, a domain-adaptive learning module is proposed to optimize three different loss functions (source classifier loss, domain classifier loss, target classifier loss), thus, the model can distinguish the source domain separately Class labels in , samples in different domains, class labels in the target domain.
-
node representation learning
Capturing the local and global consistency relations of each graph with a dual graph convolutional network. For local consistency, the adjacency matrix A is directly input to GCN; for global consistency, another convolution method based on random walk is proposed.
Local consistency: simple GCN, inputs A and X
Global Consistency: The convolutional method of PPMI (point-level mutual information matrix) is introduced to encode global information. First calculate the frequency matrix F by random walk (random walk can calculate the semantic similarity between nodes)
It can be seen that the difference between global and local consistency is that the "adjacency matrix" of the input GCN is different.
-
Attention between pictures
We need to aggregate embeddings from different graphs to generate a unified representation, automatically determining the weights between source and target graph representations from local and global GCN layers, respectively.
-
Domain Adaptive Learning in Cross-Domain Node Classification
Three classifiers are trained: ① source classifier; ② domain classifier; ③ target classifier
General training goals:
The source classifier loss is the cross entropy loss function:
Like the previous work, the domain classifier still uses the idea of adversarial learning. It is necessary to let the network distinguish between two sources, and to make the features generated by the two sources similar, making it difficult for the classifier to judge.
The target classification loss function uses entropy loss (cross-entropy is not acceptable because the target domain has no label):
-
3.4 Adaptive Trajectory Prediction via Transferable GNN
链接:Adaptive Trajectory Prediction via Transferable GNN
-
Features: Graph Neural Network, Pedestrian Trajectory Prediction, Domain Adaptation
-
Motivation: Most of the existing work does not consider that the motion of the training set and the test set follow the same pattern, ignoring the underlying distribution differences. Therefore, a framework for domain alignment is proposed to achieve domain adaptation.
-
Contributions: ① In-depth study of the domain migration problem in different trajectory domains, and a unified T-GNN method for joint prediction of future trajectories and adaptive learning of domain-invariant knowledge; ② A specially designed graph neural network is proposed to extract comprehensive We also develop an effective attention-based adaptive knowledge learning module to explore fine-grained individual-level transferable feature representations; ③ introduce a new pedestrian trajectory prediction problem setting, where The species-domain transfer setting establishes a strong baseline for pedestrian trajectory prediction.
-
Methods: A domain-invariant GNN is proposed to explore structural motion knowledge with domain-specific knowledge reduction, and an attention-based adaptive knowledge learning module is also proposed to explore fine-grained individual-level feature representations for knowledge transfer.
-
spatio-temporal feature representation
In simple terms, it is a three-layer GCN network:
-
Attention-Based Adaptive Learning
The purpose of this part is to address the gap between the source and target domains. Although the eigenvector preserves the spatio-temporal information of a pedestrian, we cannot be sure how representative the eigenvector of that pedestrian is in a domain. Therefore, an attention module is introduced to learn the relative correlation between feature vectors and trajectory domains.
-
Time Prediction Module
To put it simply, the temporal convolution TCN is used to predict the time dimension:
-
training target
For trajectory prediction, use the negative log-likelihood:
The final training objective combines the previous alignment loss and prediction loss:
-
3.5 Curriculum Graph Co-Teaching for Multi-Target Domain Adaptation (CVPR '21)
-
Features: single source domain-multiple target source domains, course learning, pseudo-label generation
-
Motivation: Solve the problem of multi-domain transfer.
-
Contributions: ① Proposed MTDA's Curriculum Graph Collaborative Teaching (CGCT), which utilizes a collaborative teaching strategy and curriculum learning approach with dual classifier heads to learn more robust representations across multiple target domains; ② In order to better Using domain labels, a domain-aware curriculum learning (DCL) strategy is proposed to smooth the feature alignment process
-
Methods: Two perspectives alleviate the problem of multi-domain transfer, feature aggregation and curriculum learning. A curriculum map collaborative learning is proposed, using dual classifier heads, one of which is GCN that aggregates features from similar samples across domains. Prevents the classifier from overfitting to its own noisy pseudo-labels. To prevent classifiers from overfitting to their own noisy pseudo-labels, develop a co-teaching strategy with dual classifier heads, supplemented by curriculum learning, to obtain more reliable pseudo-labels. Additionally, when domain labels are available, domain-aware curriculum learning (DCL) is proposed, which is a sequential adaptation strategy that first adapts to easier target domains and then to harder target domains.
-
(a) Curriculum Graph Co-Teaching
STEP 1: Domain Adaptation
用 f e d g e f_{edge} fedgeTo generate an adjacency matrix, the supervision information is given by MLP, and MLP labels the edges between nodes (the labels of two nodes are consistent, then the similarity between them is 1, otherwise it is 0):
Then generate the loss of the adjacency matrix:
The loss of GCN and MLP on the source domain:
The final optimization goal is:
STEP 2: Pseudo-label annotation
Use GCN to mark the unlabeled data, and if it is less than a certain threshold, it will not participate in the training. Why choose the output of GCN for labeling? The author said that considering the aggregation of GCN features, it is more robust than MLP. Then the data becomes:
-
(b) Domain-aware Curriculum Learning
The authors consider the case where the target domain is labeled. The shift degree of data distribution in different target domains and source domains is different, so the difficulty of self-adaptation is different. The Easy-to-Hard Domain Selection (EHDS) strategy is adopted here, first adapting to the easy domain, and then adapting to the hard domain. The reason is that it is obviously easier for the model to adapt to a domain with a smaller shift than a domain with a larger shift.
How to measure which domain is easier? The author measures this indicator with information entropy:
STEP 1: field selection
Choose a relatively simple domain based on the above information entropy:
STEP 2: Domain Adaptation
It is the same as Curriculum Graph Co-Teaching, but the data of the target domain adopts the selected domain data.
STEP 3: Pseudo-label marking
It is the same as Curriculum Graph Co-Teaching, but the data of the target domain adopts the selected domain data.
-
4 Conclusion
- In the previous three works, the first one is learning a transferable Embedding representation method, the second, third and fourth (pedestrian trajectory prediction), and the fifth (multi-domain adaptation) are all model Adaptively migrate to different domains. In addition, GNN has more applications in the field of Domain adaptation, such as: Graph-Relational Domain Adaptation (ICLR '22') believes that the relationship between different domains is not equivalent, and there is a topological relationship between them, so Think about the relationship between domains from the perspective of a graph. Unsupervised Multi-Source Domain Adaptation for Person Re-Identification (CVPR '21) This article introduces multi-source domain adaptation to the problem of person re-identification.
- I think Domain adaptation is a very promising direction, because in most practical scenarios, the data is different from the training, or even completely unknown. Then, how to maximize the value of the model in actual scenarios is a problem that needs to be continuously solved for domain adaptation.
- AutoEval is also solving the problem of unknown test scene data, but consider the problem from another point of entry, and use "estimation" to evaluate the effect of the model in an unknown environment, which belongs to a kind of "evaluation of self-learning ability" learning paradigm.