[Gait recognition] MvGGAN based on multi-view gait generation confrontation network algorithm learning "Multi-View Gait Image Generation for Cross-View Gait Recognition"

1. Papers & Code Sources

"Multi-View Gait Image Generation for Cross-View Gait Recognition"
paper address: https://ieeexplore.ieee.org/document/9349211/Code
download address: The author did not provide the source code?

2. Highlights of the paper

In reality, it is difficult to capture gait data in various perspectives. Cross-view gait recognition is to identify gait data in unknown views based on known gait data.
There are two main methods for cross-view gait recognition: view transition model (VTM) and view invariant feature extraction.
The basis of View Transformation Model is Singular Value Decomposition (SVD). For related content, please refer to the Knowledge Supplement.

  1. Gait images from different viewpoints are automatically generated by a single generator. Different subjects and different data sets can be used to jointly extract gait features, and a single generator can learn richer gait information from different subjects and data sets, increasing the diversity of fake samples.
  2. The domain alignment based on projected maximum mean difference (MMD) is performed on the real and fake gait samples to reduce the impact of domain transfer generated by the fake sample generation process.
  3. Add fake samples generated from the same or different datasets to extend the dataset and improve the generalization ability of the gait classification model.
    MvGGAN

3. Multi-view Generative Adversarial Network

3.1 Network generation process

When the generator accepts an image as input, the generator can transfer the input image into the domain defined by the target image, realizing the domain transfer task. Given paired samples, the domain transfer task can be implemented by supervised learning (such as the pix2pix method).
(Question 1) But the gait image is discrete, and it is difficult to achieve unbiased alignment under different viewing angles, so unsupervised GANs based on non-pairing are more suitable.
At a certain viewing angle, U → V \mathbf U \to \mathbf VUV needs to buildGA , DA G_A, D_AGA,DA V → U \mathbf V \to \mathbf U VU need to buildGB, DB G_B, D_BGB,DB, assuming kkk views, you need to traink ( k − 1 ) k(k-1)k(k1 ) times.
It is not clear here
(Question 2)When the amount of data is small, it is difficult to train multiple GANs, which is prone to overfitting, so the author proposes to train a conditional GAN ​​conditioned on the target view label to control the generation of gait images The single generator G of the view of , realizes multi-view mapping, and the specific working principle of MvGGAN is as follows:
(leads to this method)
{ Given an input image sequence X = { x 1 , x 2 , . . . , x N } , the target view label c Obtain the output image sequence Y = { y 1 , y 2 , . . . , y N } from G ( X , v ) → Y. The discriminator predicts the true or false of the output image perspective and identity \begin{cases} given the input image Sequence X=\{x_1, x_2, ..., x_N\}, target view angle label c\\ The output image sequence Y=\{y_1, y_2, ..., y_N is obtained from G(X, v)\to Y \}\\ The discriminator predicts the true and false of the output image perspective and identity\\ \end{cases}Given an input image sequence X={ x1,x2,...,xN} , target viewpoint label cby G ( X ,v)Y gets the output image sequence Y={ y1,y2,...,yN}The discriminator predicts whether the output image perspective and identity are true or false
M v G G A N MvGGAN M v G G A NOT Star GAN StarGANS t a r G A N , the difference is:

StarGAN MvGGAN
Convert images from one domain to another without preserving identity information To generate more multi-domain and diverse fake samples, identity information needs to be retained as training data

3.2 Network structure

overall-framework
The angles in the above diagram are for illustration only. The generator has 2 initial inputs: the original gait image ( 0° 0°0 ° ), target gait angle (90° 90°9 0 ° ), these two input quantities will output a set angle of90° 90°9 0 ° fake image (Fake image); there are 2 process inputs: the original gait angle (0 ° 0°0 ° ), fake image (90° 90°9 0 ° ), these two input quantities will output a set angle of0° 0°0 ° Reconstructed image(false in false).
All angles are90° 90°The real and fake images at 90 ° are input into the discriminator (Discriminator) and the identification discriminator (Identification discriminator), and the discrimination loss, perspective classification loss and discrimination loss are calculated respectively .

3.3 Loss function

3.3.1 Discriminator Loss (Discriminator Loss)

We need to judge the true and false probability of the image output by the generator, the discriminator tries to distinguish the fake image, and the generator tries to "deceive" the discriminator, so the adversarial loss function on the discriminator is as follows: L adv = EX [
log ⁡ D ( X ) ] + EX , vt [ log ⁡ ( 1 − D ( G ( X , vt ) ) ) ] ( 1 ) L_{adv} = \Bbb E_X[\log D(X)] + \Bbb E_ {X,v_t}[\log (1-D(G(X, v_t)))]\qquad(1)Ladv=EX[logD(X)]+EX,vt[log(1D(G(X,vt)))]( 1 )
Among them,G ( X , vt ) G(X, v_t)G(X,vt) represents the input image sequenceXXX is based on the viewing anglevt v_tvtBuild in the generator.

3.3.2 View Classification Loss

In order to generate a specific viewing angle vt v_tvtFor the gait image below, the generated gait samples need to be classified into the corresponding viewing angle vt v_t through the viewing angle classification networkvtmiddle. Suppose there is kkk views, in the discriminatorDDAdd one withkk on DA classifier with k output nodes realizes the classification of views. The view classification network is represented asD view D_{view}Dview, first of all need to D view D_{view}DviewFor optimization, the real gait image is used as input, and the corresponding real viewing angle label is used as output. The loss function of the optimization process is as follows:
L viewreal = EX , vt [ log ⁡ D view ( vt ∣ X ) ] L_{view}^ {real} = \Bbb E_{X,v_t} [\log D_{view}(v_t|X)]Lviewreal=EX,vt[logDview(vtX)]
其中, D v i e w ( v t ∣ X ) D_{view}(v_t|X) Dview(vtX ) represents the input imageXXX belongs to view anglevt v_tvtLikelihood (probability value) of . Is this loss negative?
This network is also used to transfer GGThe fake image generated by G is correctly classified to the target viewvt v_tvtin, so GGG需要最小化以下损失函数:
L v i e w f a k e = E X , v t [ − log ⁡ D v i e w ( v t ∣ Y ) ] = E X , v t [ − log ⁡ D v i e w ( v t ∣ G ( X , v t ) ) ] ( 2 ) L_{view}^{fake} = \Bbb E_{X,v_t} [-\log D_{view}(v_t|Y)]\\ = \Bbb E_{X,v_t} [-\log D_{view}(v_t|G(X,v_t))]\qquad(2) Lviewfake=EX,vt[logDview(vtY)]=EX,vt[logDview(vtG(X,vt))](2)

3.3.3 Cycle Consistency Loss

The purpose of the measurement of the first two losses is to generate a gait image that looks more realistic and belongs to the target perspective, but in addition to changes in perspective, there may also be other changes (such as: clothing, carrying objects, etc.), which will make Build results become unmanageable. In order to preserve other gait information of the input gait samples while only changing a single element of the viewing angle , a cycle consistency loss function is introduced here to preserve the gait information beyond the viewing angle change as much as possible .
L rec = EX , v , vt [ ∣ ∣ X − G ( G ( X , vt ) , v ) ∣ ∣ 1 ] ( 3 ) L_{rec} = \Bbb E_{X,v,v_t}[||XG (G(X,v_t), v)||_1]\qquad(3)Lrec=EX,v,vt[XG(G(X,vt),v)1]( 3 )
Among them, the generatorGGG is used twice, once asG ( X , vt ) G(X, v_t)G(X,vt) to generate a viewing anglevt v_tvtThe following (fake) gait image, another time G ( G ( X , vt ) , v ) G(G(X, v_t), v)G(G(X,vt),v ) reconstruction viewvvThe original input sequence under v .

3.3.4 Identification Loss

The reconstruction loss attempts to preserve the structural information and perspective information of each gait image in the input sequence, but the construction of this loss processes each gait image separately, and does not consider the relationship between different image frames. The reality is that gait is dynamic information composed of a series of frames, and calculating the loss frame by frame may lead to subject recognition errors. Therefore, it is necessary to design a recognition discriminator D id D_{id}Did. The recognition discriminator takes the original gait data sequence and the gait image sequence output by the generator as a training data pair, and generates the probability of whether the data pair comes from the same person, if it comes from the same subject, the output is 1 11 , otherwise output0 00 , the loss function is as follows:
L id = EX [ log ⁡ D id ( X ) ] + v EX , vt [ log ⁡ ( 1 − D id ( G ( X , vt ) ) ) ] ( 4 ) L_{id} = \Bbb E_X[\log D_{id}(X)] +v \Bbb E_{X,v_t}[\log (1-D_{id}(G(X, v_t)))]\qquad(4)Lid=EX[logDid(X)]+vEX,vt[log(1Did(G(X,vt)))]( 4 )
In order to make full use of the dynamic and static gait information, the article designsa LBThe recognition discriminator structure of the LB network .

3.3.5 Full Objective Loss Function (Full Objective)

Combining the above four loss functions, the final target loss function can be obtained:
L ( G , D ) = L adv + λ view L viewreal + λ view L viewfake + λ rec L rec + λ id L id ( 5 ) L(G , D) = L_{adv}+ \lambda_{view}L_{view}^{real} + \lambda_{view}L_{view}^{fake} + \lambda_{rec}L_{rec} + \lambda_{ id}L_{id}\qquad(5)L(G,D)=Ladv+lviewLviewreal+lviewLviewfake+lrecLrec+lidLid( 5 )
Among them,GGG tries to minimize this objective loss, whileDDD tries to maximize it,λ \lambdaλ is a hyperparameter (the framework parameter in the machine learning model), which is used to control the importance (weight ratio) of different loss functions in the optimization process.

3.4 Analysis of other details

3.4.1 Multi-view generation across datasets

By controlling the viewing angle label vector vt v_tvtTo control the target angle of view of the generated gait image, if vt v_tvtby including k 1 k_1 fromk1Dataset AAA composition, then it can be ink 1 k_1k1Generate gait samples from different viewpoints. Additionally, k 2 k_2 will be includedk2BB _B andvt v_tvtCombined, then M v GGAN MvGGANM v G G A N can be ink 1 + k 2 k_1+k_2k1+k2Generate gait samples from different viewpoints. In this way, dataset AAA and datasetBBB will includek 1 + k 2 k_1+k_2k1+k2From this perspective, both datasets are extended to include more samples.
In order to achieve the goal described in the above text, the author based on Star GAN StarGANThe S t a r G A N method defines a unified label vectorv ~ \tilde vv~
v ~ = [ v 1 , . . . , v n , m ] ( 6 ) \tilde v = [v_1, ..., v_n, m]\qquad(6) v~=[v1,...,vn,m]( 6 )
Among them,vi v_iviRepresents the second ii in the data seti vector labels,mmm is annn- dimensional one-hot vector, ifmmm 'siii elements are1 11 , indicating that the target view angle of the generated gait image is theiii .

3.4.2 Multi-dataset training

In gait recognition, in addition to the difference in viewing angle, clothing and carrying objects are also two very important influencing factors, so the author assumes that the gait samples of the same clothing condition or the same carrying object are the same "domain", as mentioned above kkk means that there are kkin the data setK view labels, the whole is akkA k- dimensional one-hot vector. Now you need to combine the tags representing the attributes of clothing or carrying items withkkk is concatenated to form a unified label vectoruuu,有u = [ k , d ] u=[k, d]u=[k,d ] whereddd represents a one-hot vector of clothing or carrying conditions.
In real life, the changes of clothing and carrying objects are diverse, which should correspond to different domains, but it is difficult to achieve in the experimental process, so the author only focuses on the two typical clothing (summer clothes and coats) involved in the CASIA-B data set. ), and two typical carrying items (backpack and shoulder bag) for experiments.
During the experiment, consider the "nm", "bg" and "cl" types as three domains.

3.4.3 Dataset adaptability between real and fake gait images

Since the author uses a single generator to generate fake gait images under different walking conditions (even different data sets), in theory, there should be a distribution difference between the real and fake gait images. Therefore, the author uses the t-SNE method to The distribution of true and false gait images is visualized, as shown in the figure below:
t-SNE
labels 0-4 are false gait images, and labels 5-9 are true gait images. The fake gait images in the left image cover 22 different viewpoints from CASIA-B and OU-MVLP datasets, and the true gait images only cover 11 different viewpoints from CASIA-B; the right image is only shown at 90 It is not difficult to see that there are obvious differences in the distribution of the true and false gait images under the viewing angle of °.
In order to solve the above problems, the author introduces a domain alignment method and learns a feature mapping function F ( ⋅ ) F(\cdot)F ( ) to reduce the distribution difference between real and fake gait images. The true gait feature set isD r = { xri , yri } i = 1 n \mathcal D_r = \{\mathbf x_{r_i},y_{r_i}\}_{i=1}^nDr={ xri,yri}i=1n, the false gait feature set is D f = { xfj , yfj } j = n + 1 n + m \mathcal D_f = \{\mathbf x_{f_j},y_{f_j}\}_{j=n+1} ^{n+m}Df={ xfj,yfj}d = n + 1n+m, and the label space Y r = Y f \mathcal Y_r=\mathcal Y_fYr=Yf. The distribution difference between different data sets is dominated by the marginal distribution (P), and the sample distribution difference in the same data set is dominated by the conditional distribution (Q). Considering these two factors, the author proposes: DF ( D r , D f ) =
( 1 − η ) DF ( P r , P f ) + η ∑ c = 1 CDF ( c ) ( Q r , Q f ) ( 7 ) D_F(\mathcal D_r, \mathcal D_f) = (1-\eta)D_F (P_r,P_f)+\eta \sum_{c=1}^C D_F^{(c)}(Q_r,Q_f)\qquad(7)DF(Dr,Df)=(1h ) DF(Pr,Pf)+thec=1CDF(c)(Qr,Qf)( 7 )
Among them,η \etaη is an adaptive factor for balancing marginal distribution and conditional distribution;c ∈ 1 , . . . , C c \in 1,...,Cc1,...,C represents the ID of the subject of the gait image,DF ( P r , P f ) D_F(P_r,P_f)DF(Pr,Pf) is the marginal distribution arrangement,DF ( c ) ( Q r , Q f ) D_F^{(c)}(Q_r,Q_f)DF(c)(Qr,Qf) is ID isccConditional distribution permutation of c .
The distribution difference between real and fake gait images canbe calculated byMaximum Mean Discrepancy (MMD)
, and the specific calculation formula is: DF ( D r , D f ) = ( 1 − η ) ∣ ∣ E [ F ( xr ) ] − E [ F ( xf ) ] ∣ ∣ HK 2 + η ∑ c = 1 C ∣ ∣ E [ F ( xr ( c ) ) ] − E [ F ( xf ( c ) ) ] ∣ ∣ HK 2 ( 8 ) D_F(\mathcal D_r, \mathcal D_f) = (1-\eta)||\Bbb E[F(\mathbf x_r)]-\Bbb E[F(\mathbf x_f)]||_{ \mathcal H_K}^2 \\+ \eta \sum _{c=1}^C||\Bbb E[F(\mathbf x_r^{(c)})]-\Bbb E[F(\mathbf x_f ^{(c)})]||_{\mathcal H_K}^2\qquad(8)DF(Dr,Df)=(1h ) E [ F ( xr)]E[F(xf)]HK2+thec=1CE[F(xr(c))]E[F(xf(c))]HK2( 8 )
Among them,E [ ⋅ ] \Bbb E[\cdot]E [ ] is the mean value of embedded features;HK \mathcal H_KHKRepresents the Reproducing Kernel Hilbert Space (RKHS) , based on the Representer Theorem , the formula (8) (8)( 8 ) can be expressed as
DF ( D r , D f ) = tr ( FTMF ) ( 9 ) D_F(\mathcal D_r, \mathcal D_f) = \text {tr}(\mathbf F^T \mathbf {MF}) \qquad(9)DF(Dr,Df)=tr(FTMF)(9)
其中, F ∈ R ( n + m ) × d \mathcal F \in \Bbb R^{(n+m)\times d} FR( n + m ) × d is the feature matrix formed by connecting the feature vectors of each sample, and each row in the feature matrix represents a true and false samplex \mathbf xThe eigenvector F of x ( x ) F(\mathbf x)F(x) d d d is the eigenvectorF ( x ) F(\mathbf x)Dimensions of F ( x ) .
MMD matrixM = ( 1 − η ) M 0 + η ∑ c = 1 CM c ( 10 ) \mathbf M = (1- \eta)\mathbf M_0 +\eta \sum_{c=1}^C \mathbf M_c \qquad(10)M=(1h ) M0+thec=1CMc( 1 0 ) is used to calculate the matrixF \mathbf FDistribution divergence of gait features in F , formula (10) (10)Let ( 1 0 ) be the inverse function:
( M 0 ) ij = { 1 n 2 , xi , xj ∈ D r 1 m 2 , xi , xj ∈ D j − 1 mn , others ( 11 ) (\mathbf M_0)_{ij}= \begin{cases}\frac 1{n^2}, &{\mathbf x_i,\mathbf x_j \in \mathcal D_r}\\\frac 1{m^2}, &{\ mathbf x_i,\mathbf x_j \in \mathcal D_j} \\ -\frac 1{mn}, & \text {others} \\ \end{cases} \qquad(11)(M0)ij=n21,m21,mn1,xi,xjDrxi,xjDjothers(11)
( M c ) i j = { 1 n c 2 , x i , x j ∈ D r ( c ) 1 m c 2 , x i , x j ∈ D j ( c ) − 1 m c n c , x i ∈ D r ( c ) , x j ∈ D f ( c )   o r   x i ∈ D f ( c ) , x j ∈ D r ( c ) 0 , others ( 12 ) (\mathbf M_c)_{ij}= \begin{cases} \frac 1{n_c^2}, &{\mathbf x_i,\mathbf x_j \in \mathcal D_r^{(c)}} \\ \frac 1{m_c^2}, &{\mathbf x_i,\mathbf x_j \in \mathcal D_j^{(c)}} \\ -\frac 1{m_c n_c}, &{\mathbf x_i \in D_r^{(c)},\mathbf x_j \in \mathcal D_f^{(c)} \space or \space \mathbf x_i \in D_f^{(c)},\mathbf x_j \in \mathcal D_r^{(c)} } \\ 0, & \text {others} \end{cases} \qquad(12) (Mc)ij=nc21,mc21,mcnc1,0,xi,xjDr(c)xi,xjDj(c)xiDr(c),xjDf(c) or xiDf(c),xjDr(c)others( 1 2 )
wherenc n_cnc m c m_c mcRespectively represent the subjects ccThe number of true and false samples of c . In the experiment, for the fake gait samples generated by the same dataset, setη = 0.8 \eta = 0.8the=0 . 8 , fake gait samples generated across datasets, setη = 0.2 \eta = 0.2the=0.2

In order to reduce the distribution difference between true and false samples, the mapping function FFF needs to minimize formula (7)during trainingThe objective function in ( 7 ) . In this paper, the mapping functionFFF is built as amulti-layer fully connected network-the input gait features can be extracted from any layer of the gait classification network. The author sets the number of nodes in the middle layer to 1024. The reason is that the purpose of the domain alignment network is to reduce the distribution difference between the real and fake gait images, that is to say, we only require the overall distribution of the real and fake images to be consistent. The merged features of the image are not constrained, and the experiment found that the number of nodes in the middle layer is 1024 enough to achieve distribution alignment. A higher number of nodes increases computational cost and risk of overfitting, and does not lead to improved performance. The aligned gait features are input to the next layer of the gait classification network for training or testing.

3.5 Experiment implementation

3.5.1 Network structure

The generator refers to the StarGAN model, the discriminator uses the PatchGAN model, and the discriminant loss uses the LB network in the article [A comprehensive study on cross-view gait based human identification with deep CNN] .

3.5.2 Training process

In the loss function L adv \mathcal L_{adv}LadvThe Wasserstein GAN (WGAN) is used to stabilize the adversarial training process.
Hyperparameter setting λ view = 1 \lambda_{view} = 1lview=1 ,λ rec = 10 \lambda_{rec} =lrec=1 0 ,λ id = 5 \lambda_{id} =lid=5 , use Adam to optimize the generator and discriminator. The batch size is16 161 6 , the discriminator is updated every five times, and the generator is updated once. The initial value of the learning rate is0.0001 0.00010 . 0 0 0 1 , the epoch value is50 505 0 , the learning rate will decay linearly to0 00

4. Experimental results

result1
image
result2
result3

result4
result5

5. Summary

In this paper, based on the multi-view gait generation confrontation network (MvGGAN) , the author generates fake gait samples from different views under different gait datasets. MvGGAN consists of a generator, which generates fake gait samples under several walking conditions, and a discriminator, which implements adversarial training and preserves the person's identity information. By adding generated fake gait samples to the original gait dataset, and performing domain alignment operations between real samples and fake samples, the performance of deep learning-based gait classification networks can be significantly improved.

This paper demonstrates that it is feasible to improve cross-view gait recognition performance by adding fake samples to the original gait dataset, and also demonstrates that it is possible to generate fake samples for another dataset under different viewpoints or other walking conditions. Samples are also available. Gait image generation across datasets is important, as it enables gait classification networks to learn gait information in as many walking conditions as possible, and improves gait recognition in real-world scenarios (with a large number of unregistered gait samples) performance.

0. Knowledge Supplement

0.1 Singular value decomposition (SVD)

Singular Value Decomposition (Singular Value Decomposition) is an algorithm widely used in the field of machine learning. It can be used not only for feature decomposition in dimensionality reduction algorithms, but also for recommendation systems and natural language processing. It is the realization of many machine learning algorithms. Foundation.

Eigenvalues ​​and eigenvectors
A w = λ w Aw = \lambda wAw=λ w
in which,AAA isn × nn \times nn×n- dimensional matrix,www is annn- dimensional vector (eigenvector),λ \lambdaλ is the eigenvectorwwThe corresponding eigenvalues ​​of w .
After the eigenvalues ​​and eigenvectors are obtained, the matrixAAA performs eigendecomposition:
A = W Σ W − 1 A = W \Sigma W^{-1}A=WΣW1
out ofWWW bynnn eigenvectorsw 1 , w 2 , . . . , wn w_1, w_2, ..., w_nw1,w2,...,wncomposed of n × nn \times nn×n- dimensional matrix,Σ \SigmaΣ is defined by the feature vectorwi w_iwiThe corresponding eigenvalue λ i \lambda _iliDiagonal matrix formed on the main diagonal.
If AAA is notn × nn \times nn×n- dimensional square matrix, butm × nm \times nm×n- dimensional matrix, then a singular value decomposition is required.

SVD
A = U Σ V T A = U \Sigma V^T A=UΣVT
其中, U ∈ R m × m U \in \mathbf R^{m \times m} URm×m Σ ∈ R m × n \Sigma \in \mathbf R^{m \times n} SRm×n V ∈ R n × n V \in \mathbf R^{n \times n} VRn×n Σ \Sigma Σ is 0except for the elements on the main diagonal0 , the elements on the main diagonal are the singular values,UUUV andVVV is a positive definite matrix (unitary matrix), satisfyingUTU = I , VTV = IU^TU=I, V^TV=IUTU=I,VTV=I. _
TransposeATA^TAT and MatrixAAA does multiplication to get an × nn \times nn×For an n- dimensional square matrix, perform eigendecomposition on this square matrix to obtain eigenvalues ​​and eigenvectors that satisfy the following formula:
( ATA ) vi = λ vi (A^TA)v_i = \lambda v_i(AT A)vi=l vi
Among them, by the feature vector vi v_iviThe matrix VV composed ofV is the VVin the singular value decomposition formulaV;同理,
( A A T ) u i = λ u i (AA^T)u_i = \lambda u_i (AAT)ui=λui
Generally the matrix UUU becomes a left singular vector.
Next solve forΣ \SigmaΣ矩阵。
A = U Σ V T → A V = U Σ V T V = U Σ → A v i = u i σ i → σ i = A v i u i A = U \Sigma V^T \to AV =U \Sigma V^TV=U \Sigma \\ \to Av_i = u_i \sigma_i\\ \to \sigma_i = \frac{Av_i}{u_i} A=UΣVTA V=UΣVTV=The SA vi=uipipi=uiA vi
After the above derivation, each singular value σ i \sigma_i can be obtainedpi, so as to obtain Σ \SigmaΣ matrix.
In addition, there is a solution toΣ \SigmaThe method of Σ matrix is ​​from the previous definition( ATA ) (A^TA)(AThe characteristic matrix of T A)isVVV ( A A T ) (AA^T) (AAT )feature matrix is​​UUU , that's the point to start with. toVVV矩阵为例:
A = U Σ V T → A T = ( U Σ V T ) T = V Σ U T → A T A = ( V Σ U T ) ( U Σ V T ) = V Σ 2 V T A = U \Sigma V^T \\ \to A^T = (U \Sigma V^T)^T = V \Sigma U^T \\ \to A^TA = (V \Sigma U^T)(U \Sigma V^T)=V \Sigma^2V^T A=UΣVTAT=(UΣVT)T=VΣUTAT A=(VΣUT)(UΣVT)=V S2V _
The derivation process above T shows that ATAA^TAAThe matrix composed of the eigenvectors of T AVVV , and the eigenvalue matrix is ​​the square of the singular value matrix, that is,
σ i = λ i \sigma_i = \sqrt \lambda_ipi=l i

0.2 Generative Adversarial Network (GAN)

0.2.1 Basic concepts

Generate Adversarial Networks (Generate Adversarial Networks) is a generative model, mainly composed of two parts : the discriminator and the generator .
Discriminator (discriminator): mathematically expressed as y = f ( x ) y = f(x)y=f ( x ) or conditional probability distributionp ( y ∣ x ) p(y|x)p ( y x ) . When the input training set picture isxxWhen x , the discriminator outputs the classification labelyyy , learn the input imagexxx with output labelyyThe mapping relationship of y .
Generator:mathematically expressed as a probability distributionp ( x ) p(x)p ( x ) . A generator without constraints is an unsupervised model that, given a simple prior distributionπ ( z ) \pi (z)π ( z ) (usually a Gaussian distribution) is mapped to the pixel probability distributionp ( x ) p(x)p ( x ) , that is, output a sheet that obeysp ( x ) p(x)Images from the p ( x ) distribution with features from the training set.

0.2.2 Other generative networks

  1. AutoRegressive model (AR)
  2. Variational AutoEncoder (VAE)
  3. Flow-based generator model

0.2.3 Loss function

L = min ⁡ max ⁡ V ( D , G ) = E x ∼ pdata ( x ) [ log D ( x ) ] + EZ ∼ pz ( x ) [ log ( 1 − D ( z ) ) ] L = \min\ max V(D,G)= \mathbf E_{x \sim p_{data}(x)}[logD(x)]+ \mathbf E_{Z \sim p_{z}(x)}[log(1- D(z))]L=minmaxV(D,G)=Expdata(x)[logD(x)]+EZpz(x)[log(1D ( z ) ) ]
Look at
V ( D , G ) V(D,G)V(D,G ) is the difference between generated (fake) samples and real samples;
max ⁡ V ( D , G ) \max V(D,G)maxV(D,G ) The generator is fixed, the cross-entropy loss is maximized, and the discriminator is updated;
min ⁡ max ⁡ V ( D , G ) \min\max V(D,G)minmaxV(D,G ) The generator minimizes this loss while maximizing the cross-entropy loss.

0.2.4 Training process

First train the discriminator,
put the training set data (training set) on the real label, and the fake image generated by the generator (fake image) on the fake label, and the two form a batch and send it to the discriminator for training. When calculating the loss, the discriminator's judgment on the training set tends to be true, and the fake image tends to be false. This process only updates the discriminator parameters.
Retrain the generator
to apply Gaussian distribution noise zzz (random noise) is sent to the generator, and the fake image generated by the generator is tagged with a true label and sent to the discriminator. When calculating the loss, the discriminator will make the fake image discrimination approach to true. This process only updates the parameters of the generator.

0.3 t-SNE

The main use of t-SNE is to visualize and explore high-dimensional data. It was developed and published by Laurens van der Maatens and Geoffrey Hinton in JMLR Volume IX (2008). The main goal of t-SNE is to transform a multi-dimensional dataset into a low-dimensional one. Compared to other dimensionality reduction algorithms, t-SNE works best for data visualization. If we apply t-SNE to n-dimensional data, it will intelligently map n-dimensional data to 3d or even 2d data, and the relative similarity of the original data is very good. Like PCA, t-SNE is not a linear dimensionality reduction technique, it follows nonlinearity, which is the main reason why it can capture the complex manifold structure of high-dimensional data.

Reference blog:

A simple understanding of the loss function loss of the generated confrontation network GAN

Comparative Study of GAN's Loss (1) - Understanding of Traditional GAN's Loss 1

singular value decomposition

The principle of singular value decomposition (SVD) and its application in dimensionality reduction

What are the similarities and differences between singular values ​​and eigenvalues ​​of a matrix

t-SNE: the best dimensionality reduction algorithm for visualization

Guess you like

Origin blog.csdn.net/weixin_45074807/article/details/127766753