GNN recommendation algorithm (2) - Multi-GCCF: make the best use of everything

Multi-GCCF: Make the best use of everything

Please add a picture description

1 Introduce little by little

Multi-Graph convolution collaborative filtering (Multi-GCCF) belongs to the collaborative filtering algorithm based on graph neural network.

Collaborative filtering algorithms are usually based on the assumption that similar users tend to like the same item , and items with similar customers tend to get similar ratings . Therefore, most collaborative filtering algorithms make recommendations based on the user-item bipartite network . However, since the number of items corresponding to each user is limited in actual scenarios , the constructed user-item bipartite network is very sparse . At this time, it is difficult to use deep learning algorithms such as graph convolution to generate an embedding with strong expressive ability based on limited feature information.

The authors of Mutil-GCCF alleviate this problem by considering user-user interaction information and item-item interaction information. In a bipartite network, the types of nodes include item and user. The first-order neighbors of each node are nodes of different types, while the second-order neighbors are nodes of the same kind. Although considering the second-order neighbors can take into account the interaction relationship between user-user and item-item , the authors want to explicitly consider it. Therefore, the algorithm is improved by constructing user-user network and item-item network.

In addition, the types of user nodes and item nodes are different, and it is not reasonable to use the same parameter to aggregate two types of node information of different types in the information aggregation stage. Multi-GCCF uses two sets of parameters to aggregate and feature transform these two types of nodes when updating node embedding, taking into account the inherent differences between user nodes and item nodes.

PS Friends, if you have gained something after reading it, can you give me a like, and give me some motivation ^_ ^!

2 Multi-GCCF at a glance

Multi-GCCF is mainly divided into the following three parts:

  1. Generate node embedding based on user-item bipartite network;
  2. Generate embedding based on user-user network and item-item network;
  3. Use skip-connection to connect the initial node features with the final node embedding to avoid information loss caused by layer-by-layer transfer of information.

The overall framework is as follows:
Please add a picture description

2.1 Bipartite Graph Convolutional neural networks

The Bipar-GCN layer includes forward sampling and backward aggregating . Among them, the role of forward sampling is to prevent the influence of node degree power law distribution. After sampling model 1 to KKAfter the neighbor nodes in the K layer, GCN is used to aggregate neighbor information.

Suppose the initial embedding learned for each node is eu e_ueuev e_vev(If the initial node input has feature information, this initial embedding can be learned using MLP).

kkth _k layer index user's embedding update mechanism:
huk = σ ( W uk [ huk − 1 ; h N ( u ) k − 1 ] ) , hu 0 = eu {\bf h}_u^k=\sigma({\ bf W}_u^k[{\bf h}_u^{k-1};{\bf h}^{k-1}_{N(u)}]), {\bf h}_u^0= EUhuk=s ( Wuk[huk1;hN(u)k1]),hu0=eu
Among them, W uk {\bf W}_u^kWukis a learnable parameter, σ \sigmaσ is the tanh activation function,[ ; ] [;][;] is the splicing operation. h N ( u ) k − 1 {\bf h}^{k-1}_{N(u)}hN(u)k1The update rule for is as follows:
h N ( u ) k − 1 = AGGREGATOR u ( hvk − 1 , v ∈ N ( u ) ) h^{k-1}_{N(u)}={\bf AGGREGATOR}_u( {\bf h}_v^{k-1},v \in N(u))hN(u)k1=AGGREGATORu(hvk1,vN(u))

A G G R E G A T O R u = σ ( M E A N ( { h v k − 1 Q u k } ) , v ∈ N ( u ) ) ) {\bf AGGREGATOR}_u=\sigma({\bf MEAN(\{h}_v^{k-1}{\bf Q}_u^k\}),v \in N(u))) AGGREGATORu=σ(MEAN({ hvk1Quk}),vN(u)))

Q u k {\bf Q}_u^k Qukfor the kkthK -level user aggregation weight matrix.

Similarly, the kkthThe embedding update mechanism of the k -level target item is as follows:
hvk = σ ( W vk [ hvk − 1 ; h N ( v ) k − 1 ] ) , hv 0 = ev {\bf h}_v^k=\sigma({\ bf W}_v^k[{\bf h}_v^{k-1};{\bf h}^{k-1}_{N(v)}]), {\bf h}_v^0= e_vhvk=s ( Wvk[hvk1;hN(v)k1]),hv0=ev

h N ( v ) k − 1 = A G G R E G A T O R v ( h u k − 1 , v ∈ N ( v ) ) h^{k-1}_{N(v)}={\bf AGGREGATOR}_v({\bf h}_u^{k-1},v \in N(v)) hN(v)k1=AGGREGATORv(huk1,vN(v))

A G G R E G A T O R v = σ ( M E A N ( { h u k − 1 Q v k } ) , u ∈ N ( v ) ) ) {\bf AGGREGATOR}_v=\sigma({\bf MEAN(\{h}_u^{k-1}{\bf Q}_v^k\}),u \in N(v))) AGGREGATORv=σ(MEAN({ huk1Qvk}),uN(v)))

The overall schematic diagram is as follows:
Please add a picture description

As can be seen from the above formula and schematic diagram, Multi-GCCF separates user and item embedding generation, using two different sets of weights. This operation is also mentioned in the article, considering the inherent differences between user nodes and item nodes.

Here, the generation of initial embedding can be discussed in depth. For example, if the user and item features are not obtained, is it feasible to use one-hot as the encoding? According to LightGCN 's experiments, can nonlinear activation and linear feature transformation be removed? Or can the first layer of embedding made by LightGCN be used as the initial embedding of Multi-GCCF , is it effective? Interested friends can communicate with each other~

Friends who don't know LightGCN can take a look: LightGCN doesn't believe in nonlinear activation and feature transformation .

2.2 Multi-Graph Encoding

The purpose of constructing the user-user network and the item-item network is to alleviate the insufficient information of the user-item bipartite network. The reason is that the number of items corresponding to most users is limited, and the data will be relatively sparse. network for additional information.

The specific construction method of user-user network and item-item network is: calculate the cosine similarity of the row or column of the rating/click matrix, and establish the connection between nodes according to the similarity, and the determination of the similarity threshold is to Make the average degree of the network be 10.

After the network is established, the node embedding generation rules are as follows:
zu = σ ( ∑ i ∈ N ′ ( u ) ei M u ) ; zv = σ ( ∑ j ∈ N ′ ( v ) ej M u ) z_u=\sigma( \sum_{i \in N'(u)}e_i{\bf M}_u);z_v=\sigma(\sum_{j \in N'(v)}e_j{\bf M}_u)zu=s (iN(u)eiMu);zv=s (jN(v)ejMu)
, whereM u {\bf M}_uMusum M v {\bf M}_vMvis a learnable parameter, N ′ ( u ) N'(u)N (u)SumN′ ( v ) N'(v)N (v)is nodeuuu and nodevvThe first-order neighbor nodes of v .

After obtaining the output of the MGE layer, it is spliced ​​with the embedding of the Bipar-GCN layer.

2.3 Skip-connection

Since the initial features of the node are lost in the embedding after multi-layer nonlinear activation and feature conversion, the author uses a fully connected layer to convert the initial features of the node in order to preserve the initial features of the node in the node representation. The embedding of a specified dimension is obtained and connected with the embedding of the Bipar-GCN layer and the MGE layer.

2.4 About how to stitch

As mentioned earlier, Multi-GCCF generates three types of embeddings:

  1. The embedding generated by the Bipar-GCN layer;
  2. The embedding generated by the MGE layer;
  3. The Skip connection part is based on the embedding generated by the full connection.

So how to splice these three embeddings? The author in the article tried element-wise sum , concatenation and Attention , as follows:

insert image description here

2.5 Loss function

The loss function used by Multi-GCCF is BPR loss. The basic idea is to maximize the gap between positive samples and negative samples, that is, the larger the probability gap between the products that users will buy and the products that users will not buy, the better. The specific formula As follows:
LBPR = ∑ u , i , j ∈ O − log σ ( eu ∗ ei ∗ − eu ∗ ej ∗ ) + λ ∣ ∣ Θ ∣ ∣ 2 2 + β ( ∣ ∣ eu ∗ ∣ ∣ 2 2 + ∣ ∣ ei ∗ ∣ ∣ 2 2 + ∣ ∣ ej ∗ ∣ ∣ 2 2 ) L_{BPR}=\sum_{u,i,j \in O} -log\sigma(e^*_ue^*_i - e^*_ue^ *_j)+\lambda||\Theta||_2^2+\beta(||e^*_u||_2^2+||e^*_i||_2^2+||e^*_j| |_2^2)LBPR=u,i,jOlogσ(eueieuej)+λΘ22+b ( eu22+ei22+ej22)
其中, O = { ( u , i , j ) ∣ ( u , i ) ∈ R + , ( u , j ) ∈ R − } O=\{(u,i,j)|(u,i) \in {\bf R}^+, (u,j)\in {\bf R}^-\} O={ (u,i,j)(u,i)R+,(u,j)R }R + {\bf R}^+R+ givenR − {\bf R}^-R−is the positive sample and negative sample obtained by sampling,Θ \ThetaΘ is the parameter of the model.

3 What is the effect

First, compare Recal l and NDCG with baseline , see the table below
insert image description here

It can be seen from the table that in almost all data sets, the performance of Multi-GCCF is the best.

Then the contribution of Multi-GCCF components was tested , and the results are as follows:

insert image description here

As can be seen from the above table, all three components can bring performance gains.

Finally, the performance brought about by the different embedding splicing methods mentioned above was tested:
insert image description here
as can be seen from the table, element-wise sum is a more suitable choice.

4 Summary

The inspirations that Multi-GCCF has brought to me personally are as follows:

  1. When the bipartite network is relatively sparse, building a network is a feasible path. In addition, skip connection can help us retain the original features. I personally think that the reason for the good performance of Multi-GCCF is that he made three different embeddings, so that the final embedding has a strong expressive ability. However, there is actually room for optimization on how to generate each part of the embedding. For example, use different methods to consider the differences between user and item nodes.
  2. The efficiency of Multi-GCCF is not mentioned in the article, I think this is an aspect to be considered. How to improve its efficiency should also be a more interesting question.

References

  1. Multi-Graph Convolution Collaborative Filtering

Guess you like

Origin blog.csdn.net/weixin_44027006/article/details/124860554