Recommended System Notes (13): Code Implementation of SGL Algorithm

foreword

        This algorithm is implemented on the basis of LightGCN's code. For the code and principle of LightGCN, please refer to my previous blogs: Recommendation System Notes (6): LightGCN Code Implementation_Gan Linna's Blog-CSDN Blog

       Due to the following limitations of traditional graph neural network-based algorithms such as LightGCN:

        (1) High-degree nodes have a greater impact on representation learning, and low-degree (long-tail) nodes have a worse recommendation effect;

        (2) representation is susceptible to noisy interactions, since the neighborhood aggregation scheme further amplifies the influence of observed edges.

        (3) Most of the current recommendation learning tasks are based on the paradigm of supervised learning, where the supervisory signal generally refers to the interaction data of users and items. However, these interaction data are usually extremely sparse, which is not enough to learn high-quality representations.
       Therefore, it is necessary to apply self-supervised learning (Self-supervised Learning,  SSL ) on the user-item bipartite graph, assist the recommendation model training and learning, and apply self-discrimination to learn a more robust node representation.

code explanation

loss function

        Since the loss function in SGL consists of three parts: BPR+InfoNCE+Reg, namely:

        Therefore, the loss function needs to be of the following form:

   def BPR_InfoNCE_Reg_loss(self, S, emb, emb1,emb2, init_emb):
        S = np.array(S).astype('int') # [64,3]
        # print("S:{}".format(S.shape))
        all_user_emb, all_item_emb = torch.split(emb, [self.n_users, self.n_items])
        all_user_emb0, all_item_emb0 = torch.split(init_emb, [self.n_users, self.n_items])
        sub_all_user_emb1, sub_all_item_emb1 = torch.split(emb1, [self.n_users, self.n_items])
        sub_all_user_emb2, sub_all_item_emb2 = torch.split(emb2, [self.n_users, self.n_items])
        # print("all_user_emd:{}".format(all_user_emb.shape))   # [610,64]
        # print("all_item_emd:{}".format(all_item_emb.shape))   # [9724,64]

        pos_emb = all_item_emb[S[:, 1]]     # [64,64]
        neg_emb = all_item_emb[S[:, 2]]     # [64,64]
        user_emb = all_user_emb[S[:, 0]]    # [64,64]
        # print("pos norm",torch.norm(pos_emb).item())
        # print("neg norm",torch.norm(neg_emb).item())
        # print("user norm",torch.norm(user_emb).item())
        # print(pos_emb.shape,neg_emb.shape,user_emb.shape)
        pos_emb0 = all_item_emb0[S[:, 1]]
        neg_emb0 = all_item_emb0[S[:, 2]]
        user_emb0 = all_user_emb0[S[:, 0]]

        user_embeddings1 = F.normalize(sub_all_user_emb1, dim=1)     # [610,64]
        item_embeddings1 = F.normalize(sub_all_item_emb1, dim=1)     # [9724,64]
        user_embeddings2 = F.normalize(sub_all_user_emb2, dim=1)
        item_embeddings2 = F.normalize(sub_all_item_emb2, dim=1)
        # print("user_embedding:{}".format(user_embeddings1.shape))
        # print("item_embedding:{}".format(item_embeddings1.shape))
        user_embs1 = user_embeddings1[S[:, 0]]      # [64,64]
        item_embs1 = item_embeddings1[S[:, 1]]      # [64,64]
        user_embs2 = user_embeddings2[S[:, 0]]      # [64,64]
        item_embs2 = item_embeddings2[S[:, 1]]      # [64,64]
        # print(user_embs1.shape,item_embs1.shape)

        similar_users = torch.sum(user_embs1*user_embs2,dim=-1)
        similar_items = torch.sum(item_embs1*item_embs2,dim=-1)
        total_user = torch.matmul(user_embs1,
                                        torch.transpose(user_embeddings2, 0, 1))
        total_item = torch.matmul(item_embs1,
                                        torch.transpose(item_embeddings2, 0, 1))


        # BPR Loss
        loss = (F.softplus(torch.sum(user_emb * neg_emb, dim=1) - torch.sum(user_emb * pos_emb, dim=1))).sum()
        loss=loss
        # print("begin:{}".format(loss))
        # InfoNCE Loss
        SGL_logits_user =  total_user - similar_users[:, None]
        SGL_logits_item =  total_item - similar_items[:, None]
        InfoLoss_user = torch.logsumexp(SGL_logits_user / self.temperature, dim=1)
        InfoLoss_item = torch.logsumexp(SGL_logits_item / self.temperature, dim=1)
        InfoNCE_loss = 0.005*torch.sum(InfoLoss_user + InfoLoss_item)
        loss += InfoNCE_loss
        # print(InfoNCE_loss)
        # Reg Loss
        loss += self.lamda * (
                    torch.norm(pos_emb0) ** 2 + torch.norm(neg_emb0) ** 2 + torch.norm(user_emb0) ** 2) / float(
            len(pos_emb))
        # print(loss)
        return loss

        The parameters emb, emb1, and emb2 are the original graph network without dropped edges or points, and the representation encoding results obtained after forwarding of the randomly dropped network. S is to index the data of a batch to take out the corresponding data for training.

 In the paper, three different data enhancement methods are proposed:

        Node Dropout (ND): Each node in the graph may be discarded with probability ρ, together with its connected edges. Specifically s1 and s2 are modeled as follows:

        Because two graphs are needed for comparison, two different subgraphs need to be constructed, M' and M'' to drop data respectively.

        Edge Dropout (ED): Each edge in the graph may be dropped with probability ρ. Specifically as follows:

         Random Walk (RW): The subgraphs generated by the above two operators are consistent across all layers of the graph convolution. And the RW here means that the M' and M'' of each layer are different, that is, the dropout between layers is not shared:

        The above operators only have dropout and masking operations, without adding any model parameters.

ND: delete the point and the corresponding edge immediately

    def create_ND(self,ratio,mat1):
        mat = mat1
        drop_user_idx = self.random_choice(self.n_users, int(self.n_users * ratio))
        drop_item_idx = self.random_choice(self.n_items, int(self.n_items * ratio))
        indicator_user = np.ones(self.n_users)
        indicator_item = np.ones(self.n_items)
        indicator_user[drop_user_idx] = 0
        indicator_item[drop_item_idx] = 0
        mask = diags(np.hstack((indicator_user, indicator_item)))
        mat=(mask @ mat @ mask)
        # print(mat)

        d_mat = mat.sum(axis=1)

        d_mat = np.sqrt(d_mat)
        d_mat = np.array(d_mat)
        d_mat = 1 / (d_mat.reshape(-1))
        d_mat[np.isinf(d_mat)]=0.
        d_mat = diags(d_mat)
        d_mat = d_mat.tocoo()
        final = (d_mat @ mat @ d_mat).tocoo()
        # print(final.shape)

        rows = torch.tensor(final.row)
        cols = torch.tensor(final.col)
        index = torch.cat([rows.reshape(1, -1), cols.reshape(1, -1)], dim=0)

        return torch.sparse_coo_tensor(index, torch.tensor(final.data)).to(self.device)

        Pass in the ratio to be dropped and the representation of the corresponding adjacency matrix, the dimension is (M+N, M+N), M represents the number of users, and N represents the number of items. Two mask matrices are directly constructed here. q is two diagonal matrices, except for M+N and N+M respectively. The elements corresponding to the users to be dropped are set to zero, and the other diagonal elements are , and then corresponding Multiplication can realize random drop points and edges.

ED and RW: randomly remove edges

   def create_ED_RW(self, count, ratio):
        row_arr = np.zeros(2 * count, dtype=np.int32)
        col_arr = np.zeros(2 * count, dtype=np.int32)
        data = np.ones(2 * count, dtype=np.int32)
        # 随机drop
        drop_data_index = self.random_choice(count, int(count * ratio))
        count = 0
        for key in self.train.keys():
            for value in self.train[key]:
                if count in drop_data_index:
                    data[count]=0
                row_arr[count] = int(key)
                col_arr[count] = self.n_users + int(value)
                count += 1
        count1=count
        for key in self.train.keys():
            for value in self.train[key]:
                if count-count1 in drop_data_index:
                    data[count]=0
                row_arr[count] = self.n_users + int(value)
                col_arr[count] = int(key)
                count += 1
        # print("row:{} colum:{}".format(len(row_arr),len(col_arr)))
        mat = coo_matrix((data, (row_arr, col_arr)),shape=(self.n_users+self.n_items,self.n_users+self.n_items))
        d_mat = mat.sum(axis=1)
        d_mat = np.sqrt(d_mat)
        d_mat = np.array(d_mat)
        d_mat = 1 / (d_mat.reshape(-1))
        d_mat[np.isinf(d_mat)] = 0.
        d_mat = diags(d_mat)
        d_mat = d_mat.tocoo()
        final = (d_mat @ mat @ d_mat).tocoo()
        # print(final.shape)

        rows = torch.tensor(final.row)
        cols = torch.tensor(final.col)
        # print(rows.reshape(1, -1).shape,cols.reshape(1, -1).shape)
        index = torch.cat([rows.reshape(1, -1), cols.reshape(1, -1)], dim=0)
        return torch.sparse_coo_tensor(index, torch.tensor(final.data)).to(self.device)

        Then set the data of coo_matrix to zero to realize the edge drop. It is worth noting that the RW is a different layer for each layer, so it needs to be looped multiple times:

        Correspondingly, multiple loop propagations should be set in the forward step, and the average value of the results should be taken:

        # print(emb.shape)
        for i in range(stages):
            if isinstance(mat, list):
                # print(emb.shape,mat[i].shape)
                emb = torch.sparse.mm(mat[i], emb)
                emb_list.append(emb)
            # print("emb norm", torch.norm(emb).item())
            else:
                emb = torch.sparse.mm(mat, emb)
                emb_list.append(emb)

         Other evaluation functions remain the same or have little change, so I won’t explain them here.

Run the file: train.py

from SGL import SGL
import torch

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print('device:',device)
model = SGL('Movielens/ml-latest-small/train.txt', lr = 1e-3,device = device,stages=3,augmentation='ED',ratio=0.1)
model.to(device)
model.load_test_data('Movielens/ml-latest-small/test.txt')
# model.load_test_data('yelp2018/test.txt')
# model.load_test_data('gowalla/test.txt')
model.train_model(stages=3,logger_path='ED_result.pkl')
model.evaluate(20)

model1 = SGL('Movielens/ml-latest-small/train.txt', lr = 1e-3,device = device,stages=3,augmentation='ND',ratio=0.1)
model1.to(device)
model1.load_test_data('Movielens/ml-latest-small/test.txt')
model1.train_model(stages=3,logger_path='ND_result.pkl')
model1.evaluate(20)

model1 = SGL('Movielens/ml-latest-small/train.txt', lr = 1e-3,device = device,stages=3,augmentation='ND',ratio=0.1)
model1.to(device)
model1.load_test_data('Movielens/ml-latest-small/test.txt')
model1.train_model(stages=3,logger_path='RW_result.pkl')
model1.evaluate(20)

        The results of three random data enhancement methods are tested respectively, and are saved to the corresponding pkl file for data visualization.

Running result display:

 

Guess you like

Origin blog.csdn.net/qq_46006468/article/details/126147815