Domain Adversarial (Domain Adaptation) Training

In traditional supervised learning, we often need a large amount of labeled data for training, and we need to ensure that the data distribution in the training set and test set is similar. If the data in the training and test sets have different distributions, the trained classifier will not perform well on the test set. What should I do in this situation?

Domain Adaption, also known as Domain Adversarial, is an important branch of transfer learning to eliminate differences in feature distributions between different domains. The purpose is to map the data in the source domain (Source Domain) and the target domain (Target Domain) with different distributions to the same feature space, and find a certain metric to make the "distance" in this space as far as possible. Possibly close. Then, our classifier trained on the source domain (with labels) can be directly used to classify the target domain data.

As shown in the figure above, Figure a is the distribution of samples in the source domain (with labels), and Figure b is the distribution of samples in the target domain. They have a common feature space and label space, but the source and target domains usually have different distributions, which means Therefore, we cannot directly use the classifier trained in the source domain to classify samples in the target domain. Therefore, in the domain adaptation problem, we try to map the data in the two domains so that samples belonging to the same class (label) are clustered together. At this point, we can use the labeled source domain data to train the classifier for the target domain samples.

2. Introduction to DANN (Domain-Adversarial Neural Networks)
 The most critical point in the Domain adaptation process is how to mix the source domain samples and the target domain samples, and ensure that they are separated at the same time. One of the main tasks of DANN is to this.

As shown in the figure above, the DANN structure mainly consists of 3 parts:

Feature extractor (feature extractor) - the green part of the diagram: 1) Mapping and mixing the source domain samples and the target domain samples, so that the domain discriminator cannot distinguish which domain the data comes from; 2) Extract the features required by the subsequent network to complete the task , so that the label predictor can distinguish the class from the source domain data
.
Domain classifier - the red part of the diagram: classify the data in the feature space, and try to separate out which domain the data comes from.
2.1 The overall process of DANN
The information extracted by the feature extractor will be passed to the domain classifier, and then the domain classifier will determine whether the incoming information comes from the source domain or the target domain, and calculate the loss. The training goal of the domain classifier is to try to classify the input information into the correct domain category (source domain or target domain), while the training goal of the feature extractor is exactly the opposite (due to the existence of the gradient reversal layer). The purpose of the extracted features (or the mapping result) is that the domain discriminator cannot correctly determine which domain the information comes from, thus forming an adversarial relationship.

The information extracted by the feature extractor will also be passed to the Label predictor (category predictor), because the source domain samples are marked, so when extracting features, not only the situation of the following domain discriminator should be considered, but also the source domain should be used. The labeled samples are supervised training to take into account the accuracy of classification.

 2.2 Gradient reversal layer
In the process of backpropagating to update parameters, gradient descent is to minimize the objective function, and the task of feature extractor is to maximize the label classification accuracy but minimize the domain classification accuracy, so it is necessary to Maximize the domain discriminator objective function. Therefore, there is a gradient reversal layer between the domain classifier and the feature extractor. The parameters in the pink part are optimized in the direction of decreasing Ld, and the gradient in the green part is optimized in the direction of increasing Ld. An optimizer of the network realizes that the two parts have different optimization goals, forming a confrontational relationship.

Specifically: GRL is to multiply the error passed to this layer by a negative number (-), which will make the training goals of the networks before and after GRL opposite to achieve the effect of confrontation.

 PyTorch code implementation:

import torch
from torch.autograd import Function
 
class GRL(Function):
    def __init__(self,lambda_):
        super(GRL, self).__init__()
        self.lambda_=lambda_
 
    def forward(self, input):
        return input
 
    def backward(self, grad_output):
        grad_input = grad_output.neg()
        return grad_input*self.lambda_
 
x = torch.tensor([1., 2., 3.], requires_grad=True)
y = torch.tensor([4., 5., 6.], requires_grad=True)
 
z = torch.pow(x, 2) + torch.pow(y, 2)
f = z + x + y
 
 
Grl = GRL(lambda_=1)
s = 6 * f.sum()
s = Grl(s)
 
print(s)
s.backward()
print(x.grad)
print(y.grad)


result:

tensor(672., grad_fn=<GRL>)
tensor([-18., -30., -42.])
tensor([-54., -66., -78.])
This operation process is The operations on each dimension are:

Then the derivative with respect to x is:

So when x=[1,2,3] is input, the original corresponding gradient is: [18,30,42], due to the existence of GRL, the gradient is: [-18,-30,-42]

2.3 Loss calculation
During training, the network continuously minimizes the loss of the label predictor for labeled data from the source domain. The network continuously minimizes the loss of the domain discriminator for all data from the source and target domains.

Taking a single hidden layer as an example, the feature extractor is a simple layer of neurons (multi-layers are used in complex tasks):

For class predictors:

Loss: 

Therefore, on the source domain, the training optimization goal is:

 For domain classifier:

 Loss: 

 The training optimization objective is:

The overall loss function is:

 Among them, in an iterative process, the parameters of the label predictor are updated by minimizing the objective function, and the parameters of the domain discriminator are updated by maximizing the objective function.

3. Comparison with GAN


Generative adversarial network consists of a generator (Generator) and a discriminator (Discriminator). The generator is used to generate fake images, and the discriminator is used to distinguish whether the input image is a real image or a fake image. The generator hopes that the generated pictures can fool the discriminator (to make it fake), and the discriminator hopes to improve the discrimination ability to prevent being deceived. The two play against each other until the system reaches a stable state (Nash equilibrium).

In the domain adaptation problem, there is a source domain and a target domain. Compared with generative adversarial networks, the domain adaptation problem eliminates the process of generating samples and directly regards the data in the target domain as generated samples. Therefore, the purpose of the generator has changed, and it is no longer to generate samples, but to act as a feature extractor.
———————————————
Copyright statement: This article is an original article by CSDN blogger "Janie.Wei", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement.
Original link: https://blog.csdn.net/weijie_home/article/details/119921964

Guess you like

Origin blog.csdn.net/jacke121/article/details/123844308
Recommended