Personalized recall algorithm practice (three) - PersonalRank algorithm

The user behavior expressed as a bipartite graph model. Suppose the user \ (U \) personalized recommendation, all the nodes to be calculated relative to the user \ (U \) degree of correlation, from the user PersonalRank \ (U \) corresponding to the start node migration, each to a node in (1-d \) \ probability and stop migration from \ (U \) is restarted, or in (D \) \ probabilities continue to walk, pointing from the current node randomly selects a node in accordance with a uniform distribution walk down. After so many walk round, the probability of each vertex will be accessed convergence stabilized, we can use this time to rank the probability.
Prior to the implementation of the algorithm, we need to initialize the initial probability value for each node. If we are to the user (u \) \ to recommend, then let \ (u \) corresponding node of the initial access probability to 1, the initial probability of access to other nodes 0, then calculated using the iterative formula.
\ [PR (i) = ( 1-d) r_i + d \ sum_ {j \ in in (i)} \ frac {PR (j)} {| out (i) |} \\ r_i = \ begin {cases } 1 \ \ i = u \\ 0 \ \ i! = u \ end {cases} \]

There are generally two algorithms, one is the realization of matrix A matrix of non-implementation.

Non-matrixed achieve

The establishment of a bipartite graph with userID itemID. In the code, self.G directed graph represents the global, userID and itemID to distinguish between different prefixes are added. In addition, user-item of stored in the figures, the direction of each other. Then, it is transferred in the probability figure.

Where G = dict (item_user, ** user_item) the meaning of the two together into a dict dict

import pandas as pd
import time

class PersonalRank:
    def __init__(self,X,Y):
        X,Y = ['user_'+str(x) for x in X],['item_'+str(y) for y in Y]
        self.G = self.get_graph(X,Y)

    def get_graph(self,X,Y):
        """
        Args:
            X: user id
            Y: item id
        Returns:
            graph:dic['user_id1':{'item_id1':1},  ... ]
        """
        item_user = dict()
        for i in range(len(X)):
            user = X[i]
            item = Y[i]
            if item not in item_user:
                item_user[item] = {}
            item_user[item][user]=1

        user_item = dict()
        for i in range(len(Y)):
            user = X[i]
            item = Y[i]
            if user not in user_item:
                user_item[user] = {}
            user_item[user][item]=1
        G = dict(item_user,**user_item)
        return G


    def recommend(self, alpha, userID, max_depth,K=10):
        # rank = dict()
        userID = 'user_' + str(userID)
        rank = {x: 0 for x in self.G.keys()}
        rank[userID] = 1
        # 开始迭代
        begin = time.time()
        for k in range(max_depth):
            tmp = {x: 0 for x in self.G.keys()}
            # 取出节点i和他的出边尾节点集合ri
            for i, ri in self.G.items():
                # 取节点i的出边的尾节点j以及边E(i,j)的权重wij,边的权重都为1,归一化后就是1/len(ri)
                for j, wij in ri.items():
                    tmp[j] += alpha * rank[i] / (1.0 * len(ri))
            tmp[userID] += (1 - alpha)
            rank = tmp
        end = time.time()
        print('use_time', end - begin)
        lst = sorted(rank.items(), key=lambda x: x[1], reverse=True)[:K]
        for ele in lst:
            print("%s:%.3f, \t" % (ele[0], ele[1]))

if __name__ == '__main__':
    moviesPath = '../data/ml-1m/movies.dat'
    ratingsPath = '../data/ml-1m/ratings.dat'
    usersPath = '../data/ml-1m/users.dat'

    # usersDF = pd.read_csv(usersPath,index_col=None,sep='::',header=None,names=['user_id', 'gender', 'age', 'occupation', 'zip'])
    # moviesDF = pd.read_csv(moviesPath,index_col=None,sep='::',header=None,names=['movie_id', 'title', 'genres'])
    ratingsDF = pd.read_csv(ratingsPath, index_col=None, sep='::', header=None,names=['user_id', 'movie_id', 'rating', 'timestamp'])
    X=ratingsDF['user_id'][:1000]
    Y=ratingsDF['movie_id'][:1000]
    PersonalRank(X,Y).recommend(alpha=0.8,userID=1,max_depth=50,K=30)#输出对用户1推荐的 top10 item
    # print('PersonalRank result',rank)

Matrixed achieve

\ [R = (1- \ alpha
) r_o + \ alpha M ^ T r \] wherein, \ (R & lt \) is \ (m + n \) row, a matrix, each row represents the vertex fixed vertex PR value; is \ (m + n \) row of the matrix 1, a charge of a selected vertex is a fixed vertex, which acts only one value 1, the remainder is zero. \ (M \) is the m + n rows, m + n columns of the matrix, it is the transfer matrix, the value \ (M_ {ij} = \ frac {1} {out (i)}, j \ in out (i) \ the else \ 0 \) , that is the reciprocal of the vertex, or 0 if not connected edges. Can be converted to the formula:
\ [R & lt = (E-\ Alpha M ^ T) ^ {-}. 1 (l- \ Alpha) r_o \]
wherein, \ ((E-\ Alpha M ^ T) ^ {-. 1 } \) can be seen as the result of all the recommended vertices, each column represents a vertex item, PR value of the vertices.

#-*-coding:utf-8-*-
"""
author:jamest
date:20190310
PersonalRank function with Matrix
"""
import pandas as pd
import numpy as np
import time
import operator
from scipy.sparse import coo_matrix
from scipy.sparse.linalg import gmres


class PersonalRank:
    def __init__(self,X,Y):
        X,Y = ['user_'+str(x) for x in X],['item_'+str(y) for y in Y]
        self.G = self.get_graph(X,Y)

    def get_graph(self,X,Y):
        """
        Args:
            X: user id
            Y: item id
        Returns:
            graph:dic['user_id1':{'item_id1':1},  ... ]
        """
        item_user = dict()
        for i in range(len(X)):
            user = X[i]
            item = Y[i]
            if item not in item_user:
                item_user[item] = {}
            item_user[item][user]=1

        user_item = dict()
        for i in range(len(Y)):
            user = X[i]
            item = Y[i]
            if user not in user_item:
                user_item[user] = {}
            user_item[user][item]=1
        G = dict(item_user,**user_item)
        return G


    def graph_to_m(self):
        """
        Returns:
            a coo_matrix sparse mat M
            a list,total user item points
            a dict,map all the point to row index
        """

        graph = self.G
        vertex = list(graph.keys())
        address_dict = {}
        total_len = len(vertex)
        for index in range(len(vertex)):
            address_dict[vertex[index]] = index
        row = []
        col = []
        data = []
        for element_i in graph:
            weight = round(1/len(graph[element_i]),3)
            row_index=  address_dict[element_i]
            for element_j in graph[element_i]:
                col_index = address_dict[element_j]
                row.append(row_index)
                col.append(col_index)
                data.append(weight)
        row = np.array(row)
        col = np.array(col)
        data = np.array(data)
        m = coo_matrix((data,(row,col)),shape=(total_len,total_len))
        return m,vertex,address_dict


    def mat_all_point(self,m_mat,vertex,alpha):
        """
        get E-alpha*m_mat.T
        Args:
            m_mat
            vertex:total item and user points
            alpha:the prob for random walking
        Returns:
            a sparse
        """
        total_len = len(vertex)
        row = []
        col = []
        data = []
        for index in range(total_len):
            row.append(index)
            col.append(index)
            data.append(1)
        row = np.array(row)
        col = np.array(col)
        data = np.array(data)
        eye_t = coo_matrix((data,(row,col)),shape=(total_len,total_len))
        return eye_t.tocsr()-alpha*m_mat.tocsr().transpose()

    def recommend_use_matrix(self, alpha, userID, K=10,use_matrix=True):
        """
        Args:
            alpha:the prob for random walking
            userID:the user to recom
            K:recom item num
        Returns:
            a dic,key:itemid ,value:pr score
        """
        m, vertex, address_dict = self.graph_to_m()
        userID = 'user_' + str(userID)
        print('add',address_dict)
        if userID not in address_dict:
            return []
        score_dict = {}
        recom_dict = {}
        mat_all = self.mat_all_point(m,vertex,alpha)
        index = address_dict[userID]
        initial_list = [[0] for row in range(len(vertex))]
        initial_list[index] = [1]
        r_zero = np.array(initial_list)
        res = gmres(mat_all,r_zero,tol=1e-8)[0]
        for index in range(len(res)):
            point = vertex[index]
            if len(point.strip().split('_'))<2:
                continue
            if point in self.G[userID]:
                continue
            score_dict[point] = round(res[index],3)
        for zuhe in sorted(score_dict.items(),key=operator.itemgetter(1),reverse=True)[:K]:
            point,score = zuhe[0],zuhe[1]
            recom_dict[point] = score
        return recom_dict




if __name__ == '__main__':
    moviesPath = '../data/ml-1m/movies.dat'
    ratingsPath = '../data/ml-1m/ratings.dat'
    usersPath = '../data/ml-1m/users.dat'

    # usersDF = pd.read_csv(usersPath,index_col=None,sep='::',header=None,names=['user_id', 'gender', 'age', 'occupation', 'zip'])
    # moviesDF = pd.read_csv(moviesPath,index_col=None,sep='::',header=None,names=['movie_id', 'title', 'genres'])
    ratingsDF = pd.read_csv(ratingsPath, index_col=None, sep='::', header=None,names=['user_id', 'movie_id', 'rating', 'timestamp'])
    X=ratingsDF['user_id'][:1000]
    Y=ratingsDF['movie_id'][:1000]
    rank = PersonalRank(X,Y).recommend_use_matrix(alpha=0.8,userID=1,K=30)
    print('PersonalRank result',rank)

Reference:
Recommended System Overview (a)
Github

Guess you like

Origin www.cnblogs.com/hellojamest/p/11763033.html
Recommended