"learning graph embedding with adversarial training methods" paper reading


Published in IEEE T CYBERNETICS 2020.

abstract

Numerous graph embedding tasks focus on preserving graph structure or minimizing reconstruction loss on graph data. These methods ignore the embedding distribution of latent code, which may lead to poor representation in many cases. In this paper, an adversarially regularized framework for graph embedding is proposed. The adversarial training principle is used to make the latent code match a prior gaussian or uniform distribution. Based on this framework we develop two variants of the adversarial model: ARGA, ARVGA. Numerous experimental results demonstrate the effectiveness of our method.

Key point: Previous graph embedding methods ignore the embedding distribution of latent code, which may lead to poor representation in many cases. (Reasonable distribution needs to be considered in the process of learning graph embedding).
Please add a picture description

1. introduction

Our framework not only needs to minimize the reconstruction loss of the topology, but also needs to make the learned latent embedding match a prior distribution.

4.proposed algorithm

adversarially regularized graph autoencoder(ARGA):

Including two parts: graph convolutional autoencoder (input is graph A and node content X, used to learn a latent representation Z), adversarial regularization (used to make latent codes match a priori distribution through the confrontation training module, and distinguish the current zi ∈ Z z_i \in ZziWhether Z comes from the encoder or from the prior distribution).

4.1 graph convolutional autoencoder

Two basic questions: 1. How to synthesize structural information and content information in the encoder at the same time 2. What kind of information needs to be reconstructed by the decoder.

graph convolutional encoder model G ( X , A ) G(X,A) G(X,A ) :
frequency domain convolution process:

Z ( l + 1 ) = f ( Z ( l ) , A ∣ W ( l ) ) , Z^{(l+1)}=f(Z^{(l)},A|W^{(l)}), Z(l+1)=f(Z(l),AW(l)),

Z ( l ) , Z ( l + 1 ) Z^{(l)},Z^{(l+1) }Z(l),Z( l + 1 ) are the input and output of convolution respectively,AAA is the defined adjacency matrix,Z 0 = X ∈ R n × m Z^0 = X \in \mathbb{R}^{n \times m}Z0=XRn×m, n , m n,m n,m represents the number of nodes and the number of features, respectively.

f ( Z ( l ) , A ∣ W l ) = ϕ ( D ~ − 1 2 A ~ D ~ − 1 2 Z ( l ) W ( l ) ), f(Z^{(l)},A|W ^{l}) = \phi(\tilde{D}^{-\frac{1}{2}}\tilde{A}\tilde{D}^{-\frac{1}{2}}Z^ {(l)}W^{(l)}),f(Z(l),AWl)=ϕ (D~21A~D~21Z(l)W(l)),

G ( X , A ) G(X,A) G(X,A ) Two layers of GCN convolution are used. In our article, two variants of the encoding model are also developed: graph encoder, variational graph encoder.

The graph encoder is defined as follows:

Z ( 1 ) = f R e l u ( X , A ∣ W ( 0 ) ) ; Z ( 2 ) = f l i n e a r ( Z ( 1 ) , A ∣ W ( 1 ) ) . Z^{(1)} =f_{Relu}(X,A|W^{(0)}) ;\\Z^{(2)} =f_{linear}(Z^{(1)},A|W^{(1)}). Z(1)=fR e l u(X,AW(0));Z(2)=flinear(Z(1),AW(1)).

即graph convolutional encoder G ( Z , A ) = q ( Z ∣ X , A ) G(Z,A)=q(Z|X,A) G(Z,A)=q(ZX,A ) , while encoding the graph structure and node content into a representationZ = q ( Z ∣ X , A ) = Z ( 2 ) . Z = q(Z|X,A)=Z^{(2)}.Z=q(ZX,A)=Z(2).

The variational graph encoder is defined as an inference model:

q ( Z ∣ X , A ) = ∏ i = 1 n q ( z i ∣ X , A ) , q ( z i ∣ X , A ) = N ( z i ∣ μ i , d i a g ( σ 2 ) ) . q(Z|X,A)=\prod \limits_{i=1}^n q(z_i|X,A),\\q(z_i|X,A)=\mathcal{N}(z_i|\mu_i,diag(\sigma^2)). q(ZX,A)=i=1nq(ziX,A),q(ziX,A)=N(ziμi,dia g ( σ _ _2)).

where μ = Z ( 2 ) \mu = Z^{(2)}m=Z(2)是mean vectors z i z_i ziDefinition, log σ = linear ( Z ( 1 ) , A ∣ W ′ ( 1 ) ) log \sigma = f_{linear}(Z^{(1)},A|W^{'(1)}) .logσ=flinear(Z(1),AW (1))in whichW ′ ( 1 ) = W ( 0 ) W^{'(1)}=W^{(0)}W(1)=W( 0 ) , andμ \muμ shares weights.

decoder model:
used to reconstruct graph data. ARGA decoder p ( A ^ ∣ Z ) p(\hat{A}|Z)p(A^Z)predicts whether there is a link between two nodes. More specifically, we train a link prediction layer based on graph embeddings:

p ( A ^ ∣ Z ) = ∏ i = 1 n ∏ j = 1 n p ( A ^ i j ∣ z i , z j ) ; p ( A ^ i , j = 1 ∣ z i , z j ) = s i g m o i d ( z i T , z j ) , p(\hat{A}|Z) = \prod \limits_{i=1}^n\prod \limits_{j=1}^np(\hat{A}_{ij}|z_i,z_j);\\p(\hat{A}_{i,j}=1|z_i,z_j) =sigmoid(z_i^T,z_j), p(A^Z)=i=1nj=1np(A^ijzi,zj);p(A^i,j=1zi,zj)=sigmoid(ziT,zj),

Here the predicted value ( ^ A ) \hat(A)(^A ) should be the same as ground truthAAA close.

graph autoencoder model:
embedding ZZZ and the reconstructed graphA ^ \hat{A}A^ can be represented as follows:

A ^ = s i g m o i d ( Z Z T ) , Z = q ( Z ∣ X , A ) . \hat{A} =sigmoid(ZZ^T),Z =q(Z|X,A). A^=sigmoid(ZZT),Z=q(ZX,A).

Optimization:
For the graph encoder, we minimize the reconstruction loss of the graph data:

L 0 = E q ( Z ∣ ( X , A ) ) [ l o g   p ( A ∣ Z ) ] L_0=\mathbb{E}_{q(Z|(X,A))}[log\,p(A|Z)] L0=Eq(Z(X,A))[logp(AZ)]

For the variational graph encoder, we optimize the variational lower bound:

L 1 = E q ( Z ∣ ( X , A ) ) [ l o g   p ( A ∣ Z ) ] − K L [ q ( Z ∣ X , Z ) ∣ ∣ p ( Z ) ] L_1 =\mathbb{E}_{q(Z|(X,A))}[log \, p(A|Z)]-KL[q(Z|X,Z)||p(Z)] L1=Eq(Z(X,A))[logp(AZ)]KL[q(ZX,Z)p(Z)]

where p ( Z ) p(Z)p ( Z ) is the prior distribution, which can be uniform distribution or gaussian distribution

p ( Z ) = ∏ i p ( z i ) = ∏ i N ( z i ∣ 0 , I ) . p(Z) = \prod_i p(z_i) = \prod_i \mathcal{N}(z_i|0,I). p(Z)=ip(zi)=iN(zi0,I).

4.2 adversarial model D ( Z ) \ mathcal{D}(Z)D(Z)

The underlying idea of ​​our model is to make the latent distribution ZZZ matches a prior distribution, which is achieved by adversarially training the model. The adversarial model is based on the standard MLP, and the output layer is only one-dimensional with a sigmoid function. The adversarial model acts as a discriminator to identify a latent code from the priorpz p_zpz(positive)或者graph encoder G ( X , A ) G(X,A) G(X,A ) (negative). By minimizing the cross entropy of the two classifiers, the embedding will eventually be regularized and improved during the training process, and the cost is defined as follows:

− 1 2 E z ∼ p z l o g   D ( Z ) − 1 2 E x l o g ( 1 − D ( G ( X , A ) ) ) , -\frac{1}{2}\mathbb{E}_{z\sim p_z}log\, \mathcal{D}(Z)-\frac{1}{2}\mathbb{E}_x log(1-\mathcal{D}(G(X,A))), 21EzpzlogD(Z)21Exlog(1D(G(X,A))),

p z p_z pzis the prior distribution.

adversarial graph autoencoder model:

Train with discriminator D ( Z ) \mathcal{D}(Z)D ( Z ) encoder model:

m i n G m a x D E z ∼ p z [ l o g D ( Z ) ] + E x ∼ p ( x ) [ l o g ( 1 − D ( G ( X , A ) ) ) ] . \mathop{min}\limits_G \mathop{max}\limits_{\mathcal{D}}\mathbb{E}_{z \sim p_z}[log \mathcal{D}(Z)]+\mathbb{E}_{x \sim p(x)}[log(1-\mathcal{D}(G(X,A)))]. GminDmaxEzpz[ l o g D ( Z ) ]+Exp(x)[log(1D(G(X,A)))].

Among them, G ( X , A ) G(X,A)G(X,A )为generator,D ( Z ) \mathcal{D}(Z)D ( Z )为 discriminator。

4.3 algorithm explanation

Please add a picture description

4.4 decoder variations

In the ARGA and ARVGA models, the decoder is just the link prediction layer of the dot product of embedding z. But the decoder can also be a combination of a graph convolutional layer or a link prediction layer and a graph convolutional decoder layer.

GCN decoder for graph structure reconstruction (ARGA_GD):
Modify the encoder to reconstruct the graph structure by adding two graph convolution layers. A variant of this method is called ARGA_GD. In this method, the input of the decoder is the embedding of the encoder, and the construction process of the decoder for graph convolution is as follows:

Z D = f l i n e a r ( Z , A ∣ W D ( 1 ) ) , O = f l i n e a r ( Z D , A ∣ W D ( 2 ) ) , Z_D=f_{linear}(Z,A|W_D^{(1)}),\\ O=f_{linear}(Z_D,A|W_D^{(2)}), ZD=flinear(Z,AWD(1)),O=flinear(ZD,AWD(2)),

Among them, Z , ZD , OZ,Z_D,OZ,ZD,O is the embedding learned from the graph encoder, and the output of the first layer and the second layer of the graph decoder. The reconstruction loss is calculated as follows:

LENGTH _ GD = E q ( O ∣ ( X , A ) ) [ log p ( A ∣ O ) ] L_{LENGTH\_GD}=\mathbb{E}_{q(O|(X,A))}[ log \, p(A|O)]LA R G A _ G D=Eq(O(X,A))[logp(AO)]

GCN decoder for both graph structure and content information reconstruction(ARGA_AX):

We modify the dimension of the second graph convolutional layer to be the number of features associated with each node, so the output of the second layer is O ∈ R n × f ∋ X . O \in \mathbb{R}^{n \ times f }\ni X.ORn×fX. The reconstruction loss consists of two kinds of errors . First is the reconstruction loss:

LA = E q(O ∣(X,A)) [log p(A ∣ O)], L_A = \mathbb{E}_{q(O|(X,A))}[log\, p(A |O)],LA=Eq(O(X,A))[logp(AO)],

This is followed by the reconstruction loss of the node content:

L X = E q ∣ ( O ∣ ( X , A ) ) [ l o g   p ( X ∣ O ) ] . L_X=\mathbb{E}_{q|(O|(X,A))}[log\, p(X|O)]. LX=Eq(O(X,A))[logp(XO)].

The final reconstruction loss is: LO = LA + LX . L_O=L_A+L_X.LO=LA+LX.Please add a picture description

5.experiments

Guess you like

Origin blog.csdn.net/ptxx_p/article/details/120863532