Self-supervised Graph Learning for Recommendation(SGL)

Code: GitHub - wujcan/SGL-Torch: SGL PyTorch version (the author gives two versions of Pytorch and Tensorflow)

In this paper, we propose a graph self-supervised learning framework applied to user-item bipartite graph recommender systems . The core idea is to do data enhancement on the user-item bipartite graph (three methods are proposed in this paper: node discarding (nd)), edge discarding (ed), and random walk (rw) . The enhanced graph can be regarded as The sub-view of the original graph; use any graph convolution operation on the sub-view, such as LightGCN to extract the representation of the node. For the same node, multiple views can form multiple representations; then learn from the comparison learning [ 5 ] The idea of ​​​​constructing self-supervised learning tasks: maximizing the similarity between different view representations of the same node, minimizing the similarity between different node representations; finally, comparing and learning the self-supervised task and the supervised learning task of the recommendation system. Together, it constitutes a paradigm of multi-task learning, as shown in the figure below. 

 As can be seen from the above figure, this article can be roughly divided into two parts: self-supervised learning part, unsupervised learning part (comparative learning part)

1. Self-supervised learning

Since this part is not the focus of this article, the form is very simple: lightgcn+bpr:

 

The formula in matrix form given in the Lightgcn paper: 

adj_mat = tmp_adj + tmp_adj.T

# normalize adjcency matrix
rowsum = np.array(adj_mat.sum(1))
d_inv = np.power(rowsum, -0.5).flatten()
d_inv[np.isinf(d_inv)] = 0.
d_mat_inv = sp.diags(d_inv)
norm_adj_tmp = d_mat_inv.dot(adj_mat)
adj_matrix = norm_adj_tmp.dot(d_mat_inv)

return adj_matrix

 2. Unsupervised learning

1. Node/Edge Dropping

The paper describes this part very concisely, but also abstractly. In the following formula, M^{'},M^{''} ,M_1,M_2 the masking vector (a vector composed of 0/1), , \notare \varepsilon the point set and the edge set respectively. The literal meaning is: through the element product of the masking vector and the point set/edge set, the point/edge at the corresponding position of the 0 element is disappear. However, the code is implemented by sampling. For details, it is recommended to see the original code:

M^{'},M^{''}\ \epsilon\ \{0,1\}^{|V|} 

 M_1,M_2\ \epsilon\ \{0,1\}^{|V|}

if is_subgraph and self.ssl_ratio > 0:
    if aug_type == 'nd':
       drop_user_idx = randint_choice(self.num_users, size=self.num_users * self.ssl_ratio, replace=False)
       drop_item_idx = randint_choice(self.num_items, size=self.num_items * self.ssl_ratio, replace=False)
       indicator_user = np.ones(self.num_users, dtype=np.float32)
       indicator_item = np.ones(self.num_items, dtype=np.float32)
       indicator_user[drop_user_idx] = 0.
       indicator_item[drop_item_idx] = 0.
       diag_indicator_user = sp.diags(indicator_user)
       diag_indicator_item = sp.diags(indicator_item)
       R = sp.csr_matrix(
       (np.ones_like(users_np, dtype=np.float32), (users_np, items_np)), 
       shape=(self.num_users, self.num_items))
       R_prime = diag_indicator_user.dot(R).dot(diag_indicator_item)
       (user_np_keep, item_np_keep) = R_prime.nonzero()
       ratings_keep = R_prime.data
       tmp_adj = sp.csr_matrix((ratings_keep, (user_np_keep, item_np_keep+self.num_users)), shape=(n_nodes, n_nodes))

2. Random walk

The above two operations generate a subgraph that is shared across all graph convolutional layers. To explore stronger capabilities, we consider assigning different subgraphs to different layers. This can be seen as building a separate subgraph for each node with a random walk.

The above two operators generate a subgraph shared across all the graph convolution layers. To explore higher capability, we consider assigning different layers with different subgraphs. This can be seen as constructing an individual subgraph for each node with random walk

This is a part of the description in the paper, is it confusing after reading it? Then we can only look at the code.

As can be seen from the code in the figure below, two subgraphs (sub_graph1, sub_graph2) are generated in 'nd' and 'ed', and else, that is, rw generates two subgraph subgraph list The number is equal to n_layers (the number of convolutions in GCN)

 Below is the code for the message passing. norm_adj is the adjacency matrix obtained from the upper subgraph. if is the code corresponding to rw (from the above figure, we can see that the rw data enhancement results in two subgraph lists), else is the code corresponding to nd and ed, you can see the rw method, when GCN is doing message transmission, each layer is in It is performed on different subgraphs (norm_adj[k]), while the ed method is performed on the same subgraph when doing GCN (norm_adj)

Guess you like

Origin blog.csdn.net/qq_42018521/article/details/131615731