【Paper Notes】(DGCN-HN) Deep Graph Convolutional Networks with Hybrid Normalization for Accurate and …

image-20220322155452154

This article was published on KDD 2021

On the basis of LR-GCCF and LightGCN, it combines left normalization (assign equal normalization to different neighbors, PinSAGE) and symmetric normalization (assignment of small weights to neighbors with large degree, LightCGN)

1. Intro

1.1 Not enough layers

The GCN-based model achieves the best performance in the shallow layer, which does not use high-order signals

Add residual connection and overall connection (holistic connection).

The proposed model can deepen the graph convolution to 8 layers.

1.2 Use fixed normalization rules

The GCN-based model uses the same normalization rule for aggregating neighborhood information, making the importance of neighbors equally important

  • left normalization (left normalization), assign the same normalization to different neighbors
  • Symmetric normalization (symmetric normalization), assigning smaller weights to popular neighbors and larger weights to unpopular neighbors
  1. Using fixed normalization rules can lead to suboptimal results. The article explores the effects of left normalization and symmetric normalization from the perspective of accuracy and diversity .
  2. Nodes are inherently different, requiring different normalizations to aggregate neighborhood information. A new model is proposed: Hybrid Normalized Deep Convolutional Network (DGCN-HN).
    Hybrid normalization layer and simplified attention network to more flexibly model neighbor importance by adaptively fusing information from different normalization rules

insert image description here

  • Shallow GCN cannot use high-order signals, for u 2 u_2 who like electronic productsu2Recommending drones requires high-level signals.
  • Use left normalization for all nodes, the popularity of IPhone will seriously affect u 1 u_1u1of interest, u 1 u_1u1Buying an IPhone is just a need, and the interest is not in electronic products, because u 1 u_1u1Buy paper stationery frequently.
  • Using symmetric normalization for all nodes, the popular IPhone cannot provide the interest of "electronic products", u 2 u_2u2u 3 u_3u3interests are difficult to characterize.

2. METHOD

insert image description here

Consists of three main components:

  • Design of Deep Graph Convolutional Network Recommendations with Residual and Holistic Connections
  • Introducing a hybrid normalization layer for flexible modeling of neighbor importance, combining left normalization and symmetric normalization
  • A simplified attention network is proposed to adaptively fuse different normalization methods to represent information

2.1 Deep Graph Convolutional Network for Recommendation

insert image description here

The linear aggregation of each layer can be expressed as a residual connection, the formula is as follows:
hu ( l + 1 ) = ∑ v ∈ N u A ~ uvhv ( l ) + hu ( l ) , hu ( 0 ) = eu h_{u}^ {(l+1)}=\sum_{v \in N_{u}} \tilde{A}_{uv} h_{v}^{(l)}+h_{u}^{(l)}, \quad h_{u}^{(0)}=e_{u}hu(l+1)=vNuA~uvhv(l)+hu(l),hu(0)=eu

h v ( l + 1 ) = ∑ u ∈ N v A ~ v u h u ( l ) + h v l , h v ( 0 ) = e v h_{v}^{(l+1)}=\sum_{u \in N_{v}} \tilde{A}_{v u} h_{u}^{(l)}+h_{v}^{l}, \quad h_{v}^{(0)}=e_{v} hv(l+1)=uNvA~vuhu(l)+hvl,hv(0)=ev

After the aggregation of the L layer, the representations of all layers are integrated using the overall connection, and the model uses the element-based average aggregation as the fusion strategy of the last layer. The formula is as follows:
eu ∗ = 1 L + 1 ∑ i = 0 L hu ( i ) e_{u}^{*}=\frac{1}{L+1} \sum_{i=0}^{L} h_ {u}^{(i)}eu=L+11i=0Lhu(i)
e v ∗ = 1 L + 1 ∑ j = 0 L h v ( j ) e_{v}^{*}=\frac{1}{L+1} \sum_{j=0}^{L} h_{v}^{(j)} ev=L+11j=0Lhv(j)

(The role of this part is to be discussed, see the final experiment)

2.2 Hybrid Normalization for Flexible Modeling of Neighbor Importance

Taking the user node as an example, the formula for symmetric normalization is as follows:

hu ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ ∣ N v ∣ hv ( l ) + hu ( l ) \mathbf{h}_{u}^{(l+1)}=\sum_ {v \in N_{u}} \frac{1}{\sqrt{\left|N_{u}\right|} \sqrt{\left|N_{v}\right|}} \mathbf{h}_ {v}^{(l)}+\mathbf{h}_{u}^{(l)}hu(l+1)=vNuNu Nv 1hv(l)+hu(l)
For example,
hu ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ hv ( l ) + hu ( l ) h_{u}^{(l+1)}=\sum_{ v \in N_{u}} \frac{1}{\left|N_{u}\right|} h_{v}^{(l)}+h_{u}^{(l)}hu(l+1)=vNuNu1hv(l)+hu(l)

2.3 Simplified Attention Network for Adaptive Combination

In order to combine the advantages of the two normalizations, the simplest is the average sum, a little more complicated, weighted average, plus Attention

h N u , LN ( l ) \mathbf{h}_{N_{u}, LN}^{(l)}hNu,LN(l)h N u , SN ( l ) \mathbf{h}_{N_{u}, SN}^{(l)}hNu,SN(l), which means left normalization and symmetric normalization. Taking the user node as an example, the node update formula is as follows:
h N u , LN ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ hv ( l ) h_{ N_{u}, LN}^{(l+1)}=\sum_{v \in N_{u}} \frac{1}{\left|N_{u}\right|} h_{v}^{ (l)}hNu,LN(l+1)=vNuNu1hv(l)

h N u , SN ( l + 1 ) = ∑ v ∈ N u 1 ∣ N u ∣ ∣ N v ∣ hv ( l ) \mathbf{h}_{N_{u}, SN}^{(l+1) }=\sum_{v \in N_{u}} \frac{1}{\sqrt{\left|N_{u}\right|} \sqrt{\left|N_{v}\right|}} \mathbf {h}_{v}^{(l)}hNu,SN(l+1)=vNuNu Nv 1hv(l)

hu ( l + 1 ) = hu ( l ) + α u , LN ( l + 1 ) h N u , LN ( l + 1 ) + α u , SN ( l + 1 ) h N u , SN ( l + 1 ) \mathbf{h}_{u}^{(l+1)}=\mathbf{h}_{u}^{(l)}+\alpha_{u, LN}^{(l+1)} \mathbf{h}_{N_{u}, LN}^{(l+1)}+\alpha_{u, SN}^{(l+1)} \mathbf{h}_{N_{u}, SN}^{(l+1)}hu(l+1)=hu(l)+ain , L N(l+1)hNu,LN(l+1)+au,SN(l+1)hNu,SN(l+1)

任何,α u , ∗ l \alpha_{u, *}^{l}au,lDenotes the normalized attention scores of different layers.

When calculating the attention score, two aspects are considered:

  • Neighbor's self-information
  • Similarity between Center Representation and Neighbor Representation

Let us consider the following:
alcohol , ∗ ( l + 1 ) = W 1 ( l ) σ ( W 2 ( l ) ( h N u , ∗ ( l + 1 ) + h N u , ∗ ( l + 1 ) . ⊙ hu ( l ) ) ) z_{u, *}^{(l+1)}=\mathbf{W}_{1}^{(l)} \sigma\left(\mathbf{W}_{2 }^{(l)}\left(\mathbf{h}_{N_{u, *}}^{(l+1)}+\mathbf{h}_{N_{u}, *}^{( l+1)} \odot \mathbf{h}_{u}^{(l)}\right)\right)zu,(l+1)=W1(l)p(W2(l)(hNu,(l+1)+hNu,(l+1)hu(l)))

Among them, zu , ∗ ( l + 1 ) z_{u,*}^{(l+1)}zu,(l+1)Is the attention score before normalization, W 1 ( l ) ∈ R 1 × dt , W 2 ( l ) ∈ R dt × dl W_1^{(l)}\in \mathbb{R}^{1\times d^t},W_2^{(l)}\in \mathbb{R}^{d^t\times d^l}W1(l)R1×dt,W2(l)Rdt×dl is the feature transformation matrix,σ \sigmaσ is the activation function,dtd^tdt is the dimensionality of the hidden layer of the attention network. ⊙ \odot represents the Hadamard product, and the elements at the corresponding positions are multiplied.

As a result, the training failed to converge in the experiment, and a simple one was replaced . The feature transformation matrix and activation function were removed, and average aggregation was used to obtain a simplified attention network as follows:
zu , ∗ ( l + 1 ) = ave ⁡ ( h N u , ∗ ( l + 1 ) + h N u , ∗ ( l + 1 ) ⊙ hu ( l ) ) z_{u, *}^{(l+1)}=\operatorname{ave}\left(\mathbf {h}_{N_{u}, *}^{(l+1)}+\mathbf{h}_{N_{u}, *}^{(l+1)} \odot \mathbf{h} _{u}^{(l)}\right)zu,(l+1)=ave(hNu,(l+1)+hNu,(l+1)hu(l))
and then use the Softmax function to getα \alphaα, 公式 as follows:
α u , ∗ ( l ) = exp ( zu , ∗ ( l ) ) ∑ k ∈ LN , SN exp ( zu , k ( l ) ) \alpha_{u,*}^{(l)} =\frac{exp(z_{u,*}^{(l)})}{\sum_{k\in {LN,SN}}exp(z_{u,k}^{(l)})}au,(l)=kLN,SNexp(zu,k(l))exp(zu,(l))
The loss function is BPR loss

3. EXPERIMENTS

insert image description here

  • The simplest definition of coverage is the ratio of the items recommended by the recommendation system to the total items. A higher coverage rate indicates that the model can generate recommendations for more items, thereby promoting the mining of long-tail effects.
    The upper bound of Coverage@k is 1. The bigger the better, the more items are recommended to users.
  • Entropy@K focuses on assignment recommendations for different items. It calculates the entropy value based on the number of recommendations in different rounds. The larger the entropy value, the better the diversity, and the higher the heterogeneity of recommended items.

The article also takes into account the diversity of recommended results, the results are as follows:

insert image description here

The effect of different normalization schemes on GCN:

insert image description here

As can be seen

  • When using a single normalization, left normalization achieves the best diversity, symmetric normalization achieves the best accuracy, and right normalization performs worst in all respects
  • In hybrid normalization, the combination of left normalization and symmetric normalization achieves the best accuracy and diversity

Ablation experiment:

insert image description here

  • The introduced simplified attention network improves accuracy and increases diversity
  • Hybrid normalization has a significant performance improvement on both datasets, especially for the improvement of Coverage

Summarize:

  1. The article explores the effects of left normalization and symmetric normalization from the perspective of accuracy and diversity. The author can propose this idea because he jumped out of the GCN model itself, considered the difference between recommendation and other tasks, and connected it with reality.

  2. According to the author's experiment, even if only the residual is added, the effect is greatly improved, but this is contrary to our own previous experiment.

Guess you like

Origin blog.csdn.net/weixin_45884316/article/details/123744078