[Introduction to Artificial Intelligence] Python standard library - dalib (domain adaptation)

[Introduction to Artificial Intelligence] Python standard library - dalib (domain adaptation)


1. Domain Discriminator

dalib.modules.domain_discriminator.DomainDiscriminator(in_feature: int, hidden_size: int)

  • Function: Distinguish whether the input features come from the source domain or the target domain. The source domain label is 1 and the target domain label is 0.
  • parameter:
  • in_feature(int): Dimension of input feature;
  • hidden_size (int): Dimension of hidden layer features.
  • shape:
  • inputs:(minibatch, in_feature);
  • outputs: (minibatch, 1)。
  • Example:
  • See DomainAdversarialLoss for an example.

2. Domain AdversarialLoss

dalib.adaptation.dann.DomainAdversarialLoss(domain_discriminator: torch.nn.modules.module.Module, reduction: Optional[str]= 'mean')

  • 定义: L o s s ( D s , D t ) = E x i s ⌢ D s l o g [ D ( f i s ) ] + E x j t ⌢ D t l o g [ 1 − D ( f j t ) ] Loss(D_{s},D_{t})=E_{x_{i}^{s}\frown D_{s}}log[D(f_{i}^{s})]+E_{x_{j}^{t}\frown D_{t}}log[1-D(f_{j}^{t})] Loss(Ds,Dt)=ExisDslog[D(fis)]+ExjtDtlog[1D(fjt)] where D is the domain discriminator and f is the feature of the domain.
  • parameter:
  • domain_discriminator (nn.Module): Domain discriminator object, used to predict the domain of features;
  • reduction (string, Optional): Specify the way to output the loss, 'none', 'sum', 'mean', where 'none' means direct output without any dimensionality reduction, 'sum' and 'mean' respectively calculate the loss Sum and average, the default is average.
  • enter:
  • f_s (tensor): Feature fsf^{s} of the source domainfs
  • f_t (tensor): Features of the target domain ftf^{t}ft
  • shape:
  • f_s, f_t: (N, F) F is the dimension of the input feature;
  • outputs : Defaults to scalar, but if reduction is 'none' the shape of the outputs is (N,).
  • Example:
from dalib.modules.domain_discriminator import DomainDiscriminator
from dalib.adaptation.dann import DomainAdversarialLoss

discriminator = DomainDiscriminator(in_feature= 1024, hidden_size= 2048)
loss = DomainAdversarialLoss(discriminator, reduction= 'mean')

f_s, f_t = torch.rand(20, 1024), torch.rand(20, 1024)
output = loss(f_s, f_t)

print(output)

Insert image description here

3. GaussianKernel

dalib.modules.kernels.GaussianKernel(sigma: Optional[float] = None, track_running_stats: Optional[bool] = True, alpha: Optional[float] = 1.0)

  • definition:
  • Gaussian kernelkkDefinition of k : k ( x 1 , x 2 ) = exp ( − ∥ x 1 − x 2 ∥ 2 2 σ 2 ) k(x_{1},x_{2})=exp(-\frac{\left \ | x_{1}-x_{2} \right \|^{2} }{2\sigma ^{2}} )k(x1,x2)=exp(2 p2x1x22) 其中 x 1 , x 2 ∈ R d x_{1},x_{2}\in R^{d} x1,x2Rd is a one-dimensional tensor.
  • Gaussian kernel matrix KKK is defined atX = ( x 1 , x 2 , . . . xm ) X=(x_{1},x_{2},...x_{m})X=(x1,x2,...xm)上: K ( x ) i , j = k ( x i , x j ) K(x)_{i,j} = k(x_{i},x_{j}) K(x)i,j=k(xi,xj)
  • In the operation σ 2 \sigma ^{2}p2 There are two confirmation methods:
    the first is obtained dynamically by calculating the following formula:σ 2 = α n 2 ∑ i , j ∥ xi − xj ∥ 2 \sigma ^{2} = \frac{\alpha }{n^{ 2}}\sum _{i,j}\left \| x_{i}-x_{j} \right \| ^{2}p2=n2ai,jxixj2The
    second method is to directly give the value.
  • parameter:
  • sigma(float, optional): 即 σ \sigma σ , default is None;
  • track_running_stats(bool, optional): If it is 'True', use the previous formula to calculate σ 2 \sigma^{2}p2 , if 'False', use fixedσ 2 \sigma^{2}p2 , default is 'True';
  • alpha(float, optional): When track_running_stats is 'True', calculate σ 2 \sigma^{2}p2 providesα \alphaa .
  • enter:
  • X(tensor): Input group X.
  • shape:
  • inputs: (minibatch, F), F is the dimension of the input feature;
  • outputs:(minibatch, minibatch) 。

4. Multi-core maximum mean difference (MK-MMD)

dalib.adaptation.dan.MultipleKernelMaximumMeanDiscrepancy(kernels: Sequence[torch.nn.modules.module.Module], Linear: Optional[bool]= False, quadratic_program: Optional[bool]= False)

  • MK-MMD:
  • 源域为: D s = { ( x i s , y i s ) } i = 1 n s D_{s}= \left \{ (x_{i}^{s},y_{i}^{s}) \right \}_{i=1}^{n_{s}} Ds={ (xis,yis)}i=1ns
  • 目标域: D t = { x j t } j = 1 n t D_{t}= \left \{ x_{j}^{t} \right \}_{j=1}^{n_{t}} Dt={ xjt}j=1nt
  • Their respective samples are independent and identically distributed;
  • Then the calculation formula of MK-MMD is: d MK − MMD ( D s , D t ) = ∥ E s [ g ( D s ) ] − E t [ g ( D t ) ] ∥ H k 2 d_{MK-MMD }(D_{s},D_{t})=\left \| E_{s}[g(D_{s})]-E_{t}[g(D_{t})] \right \| ^{ 2}_{H_{k}}dMKMMD(Ds,Dt)=Es[g(Ds)]Et[g(Dt)]Hk2
  • H k H_{k} HkIndicates having a specific kernel kkk 'sRKHS RKHSRKHS g ( ∗ ) g(*) g ( ) is a continuous map related to the kernel function,E [ ∗ ] E[*]E [ * ] is the expectation of a given distribution;
  • It should be noted that the kernel function kkk is defined asrrThe convex combination of r different positive semi-definite kernels has the following form: k ( xs , xt ) = ∑ i = 1 r β iki ( xs , xt ) k(x^{s},x^{t})= {\ textstyle \sum_{i=1}^{r}}\beta _{i}k_{i}(x^{s},x^{t})k(xs,xt)=i=1rbiki(xs,xt)
  • Among them: ∑ ir β i = 1 , β i ≥ 0 {\textstyle \sum_{i}^{r}}\beta _{i}=1,\beta _{i}\ge 0irbi=1,bi0
  • The so-called positive semi-definiteness is a common property of the kernel function (you can learn it by contacting related concepts in SVM). The convex combination is a linear combination. If λ i ≥ 0 is satisfied, ∑ ir λ i = 1 \lambda _{i} \ ge 0,{\textstyle \sum_{i}^{r}}\lambda _{i}=1li0irli=1 ∑ i r λ i x i {\textstyle \sum_{i}^{r}}\lambda _{i}x_{i} irlixiThat is a convex combination;
  • 使用内核技巧,MK-MMD可以简化计算为: D ^ k ( D s , D t ) = 1 n s 2 ∑ i = 1 n s ∑ j = 1 n s k ( D s i , D s j ) + 1 n t 2 ∑ i = 1 n t ∑ j = 1 n t k ( D t i , D t j ) − 2 n s n t ∑ i = 1 n s ∑ j = 1 n t k ( D s i , D t j ) \hat{D}_{k}(D_{s},D_{t})= \frac{1}{n_{s}^{2}} {\textstyle \sum_{i=1}^{n_{s}}} {\textstyle \sum_{j=1}^{n_{s}}} k(D_{s}^{i},D_{s}^{j}) +\frac{1}{n_{t}^{2}} {\textstyle \sum_{i=1}^{n_{t}}} {\textstyle \sum_{j=1}^{n_{t}}} k(D_{t}^{i},D_{t}^{j}) -\frac{2}{n_{s}n_{t}} {\textstyle \sum_{i=1}^{n_{s}}} {\textstyle \sum_{j=1}^{n_{t}}} k(D_{s}^{i},D_{t}^{j}) D^k(Ds,Dt)=ns21i=1nsj=1nsk(Dsi,Dsj)+nt21i=1ntj=1ntk(Dti,Dtj)nsnt2i=1nsj=1ntk(Dsi,Dtj)
  • parameter:
  • Kernel(tuple(nn.Module)): Kernel equation;
  • Linear (bool): whether to use the linear version of DAN, not used by default;
  • quadratic_program(bool): Whether to use quadratic programming to solve β \betaβ , not used by default.
  • enter:
  • d_s(tensor): Feature D s D_{s} obtained by mapping the source domainDs
  • d_t(tensor): Feature D t D_{t} obtained by mapping the target domainDt
  • Note that they must both have the same shape.
  • shape:
  • inputs: (minibatch, *) * represents any number, which is actually the incoming feature dimension;
  • outputs: scalar.
  • Example:
from dalib.modules.kernels import GaussianKernel
from dalib.adaptation.dan import MultipleKernelMaximumMeanDiscrepancy

feature_dim = 1024
batch_size = 10

kernels = (GaussianKernel(alpha=0.5), GaussianKernel(alpha=1.), GaussianKernel(alpha=2.))
loss = MultipleKernelMaximumMeanDiscrepancy(kernels)

# features from source domain and target domain
z_s, z_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size,feature_dim)
output = loss(z_s, z_t)

print(output)

Insert image description here

Guess you like

Origin blog.csdn.net/qq_44928822/article/details/131315718