Implicit recommendation system of semantic model LFM

LFM (latent factor model) hidden semantic model, which is a fairly common application model in the recommended system. That this model with different ItemCF or UserCF that:

  1. For UserCF, we first calculate the target users with similar interests users, then again recommend items to users based on the calculated target user's favorite items.
  2. And ItemCF, we can target the user's favorite items to find these items and similar items, and then recommended to the user.
  3. We still have a way to carry out all the items sorted and classified according to the user's interest to users recommend the classification of goods, LFM is used to implement this approach.

If the last method to realize the need to address the following questions:

  1. To goods classification
  2. What type and level of interest to determine the user's interests belong
  3. For users interested in the class, how to recommend items to users

Classification, it is easy to think of artificial sort the articles, but artificial classification is a very subjective thing, such as a movie because it is the user might see a comedy, but it may also be because he's seen starring Stephen Chow, it is also possible because this is a part of West type of film, different people can get different classification.

For the classification of the article and is difficult to control the particle size, whether the article down to a required extent, such as a linear algebra can be categorized into mathematics, it can also be classified into mathematics, even according to the main field of linear algebra applied again fine points, but for people in the field of non-professional, we want to make small items such size segment is undoubtedly a thankless thing.

And an item belongs to a class, but compared to other items in this article, whether it is more in line with this class? It is very difficult to do manually determined. To solve this problem, we need an implicit semantic model. Hidden semantic model, it may be based on the user's behavior automatic clustering, and the number of the class, i.e., by a controllable particle size completely.

Whether an item is a completely determined by the user's behavior with a class, we assume that while the two articles are like many users, these two items have a great chance to belong to the same class. The right of an item in the class share of the weight, but also can come from the calculations.

The following formula is hidden semantic model calculation formula for the user u i items of interest:

Wherein, P u, k measures the relationship between the user's interest and u k-th hidden class, and Q i, k measures the relationship between the classes and the k-th hidden items i

Took the question is how to calculate the two parameters p and q, the method of calculating this linear model used here is the gradient descent method. Probably the idea is to use a set of data, including the user likes and dislikes goods items, p and q is calculated based on this data set.

If there are no negative samples, then for a user, he has not been sampled from the behavior of some items of goods as a negative sample, but the sample, the number of positive and negative samples to ensure that every user fairly.

Formulas given below, for positive samples, we set r = 1, negative samples r = 0, the following loss function to be optimized to find the optimum parameters and parameter p q:

 

 

Inside there are two sets of parameters loss function P UK and Q IK , stochastic gradient descent method, they were required for partial derivatives can be obtained:

Then, according to the stochastic gradient descent method, the parameter needs to be along the steepest descent direction, can be obtained following recursion formula:

 

Where α is the learning rate, it needs to select obtained by trial and error.

Behind the lambda to prevent overfitting regularization term is given below Python code.

Copy the code
from multiprocessing Import Pool, Manager 
from Math Import exp 
Import PANDAS AS PD 
Import numpy AS NP 
Import the pickle 
Import Time 


DEF the getResource (csvPath): 
    '' ' 
    to obtain the original data 
    : param csvPath: csv raw data path 
    : return: Frame 
    ' '' 
    Frame pd.read_csv = (csvPath) 
    return Frame 


DEF getUserNegativeItem (Frame, userID): 
    '' ' 
    get user feedback Item negative: Top but the user had no score equal to the number of positive feedback 
    : param frame: ratings data 
    : param userID: user ID 
    : return: negative feedback article 
    '' ' 
    userItemlist List = (SET (Frame [Frame [' the userID '] == userID] [' MovieID '])) # User has rated items
    otherItemList = [item for item in set (frame [ 'MovieID']. values) if item not in userItemlist] # items the user does not score 
    itemCount = [len (frame [frame [ 'MovieID'] == item] [ 'UserID ']) for item in otherItemList] # degree of popular items 
    Series = pd.Series (itemCount, index = otherItemList) 
    Series = series.sort_values (Ascending = False) [: len (userItemlist)] # Get the number of negative feedback positive feedback article Item 
    negativeItemList = List (series.index) 
    return negativeItemList 


DEF getUserPositiveItem (Frame, userID): 
    '' ' 
    to obtain the user positive feedback Item: Rating over items 
    : param frame: ratings data 
    : param userID: user ID 
    : return: positive feedback article 
    '' ' 
    Series = Frame [Frame [' the UserID '] == userid] [ 'MovieID]
    List = positiveItemList (series.values) 
    return positiveItemList


initUserItem DEF (Frame, userID = 1): 
    '' ' 
    initial user article positive and negative feedback, positive feedback is a label, a negative feedback is 0 
    : param Frame: ratings data 
    : param userID: User ID 
    : return: positive and negative feedback dictionary article 
    '' ' 
    positiveItem = getUserPositiveItem (Frame, userID) 
    negativeItem = getUserNegativeItem (Frame, userID) 
    itemDict = {} 
    for Item in positiveItem: itemDict [Item] =. 1 
    for Item in negativeItem: itemDict [Item] = 0 
    return itemDict 


DEF initPara ( userID, the itemID, classCount): 
    '' ' 
    initialization parameter q, p matrix, random 
    : param userCount: user ID 
    : param itemCount: Item ID 
    : param classCount: number of hidden categories
    : return: the parameters P, Q  
    ' ''
    arrayp np.random.rand = (len (userID), classCount) 
    arrayq = np.random.rand (classCount, len (the itemID)) 
    P = pd.DataFrame (arrayp, Columns = Range (0 , classCount), userID = index) 
    Q = pd.DataFrame (arrayq, the itemID Columns =, Range index = (0, classCount)) 
    return P, Q 


DEF Work (ID, Queue): 
    '' ' 
    multi-function slave process 
    : param id: user ID 
    : param queue: queue 
    '' ' 
    Print (ID) 
    itemDict initUserItem = (Frame, userID = ID) 
    queue.put ({id: itemDict}) 


DEF initUserItemPool (userID): 
    ' '' 
    initialize the target user sample 
    : param userID: target user 
    : return: 
    '' ' 
    the pool = Pool () 
    userItem = [ ] 
    Queue Manager = (). Queue ()
    in the above mentioned id userID for: pool.apply_async (Work, args = (the above mentioned id, Queue)) 
    pool.close () 
    pool.join () 
    the while not queue.empty (): userItem.append (queue.get ()) 
    return userItem 


DEF initModel (Frame, classCount): 
    '' ' 
    initialization model: parameters p, q, sample data 
    : param frame: source data 
    : param classCount: hidden class number 
    : return: 
    ' '' 
    userID = List (SET (Frame [ 'the userID' ] .values)) 
    the itemID List = (SET (Frame [ 'MovieID']. values)) 
    P, Q = initPara (userID, the itemID, classCount) 
    userItem = initUserItemPool (userID) 
    return P, Q, userItem 


DEF SIGMOD (X) : 
    '' ' 
    unit step function, the degree of interest is defined in [0,Within the range. 1] 
    : X param: interestingness 
    : return: interestingness 
    '' ' 
    Y = 1.0 / (. 1 + exp (the -X-)) 
    return Y 


DEF lfmPredict (p, Q, userID, the itemID): 
    ' '' 
    using the parameters p , q prediction target user interest degree of the target objects 
    : param p: user interests and implicit class relationship 
    : param q: relationship hidden class and articles 
    : param userID: target user 
    : param itemID: target items 
    : return: predicted interest of 
    '' ' 
    P = np.mat (p.ix [userID] .values) 
    Q = np.mat (Q [the itemID] .values) .T 
    R & lt = (P * Q) .sum () 
    R & lt SIGMOD = (R & lt) 
    R & lt return 


DEF latenFactorModel (Frame, classCount, iterCount, Alpha, Lamda): 
    '' ' 
    hidden semantic model calculation parameters P, Q 
    : param Frame: data source 
    : param classCount: number of hidden categories 
    : param iterCount: iterations 
    : param alpha: step 
    : param lamda: regularization parameter
    :return: 参数p,q
    '''
    p, q, userItem = initModel(frame, classCount)
    for step in range(0, iterCount):
        for user in userItem:
            for userID, samples in user.items():
                for itemID, rui in samples.items():
                    eui = rui - lfmPredict(p, q, userID, itemID)
                    for f in range(0, classCount):
                        print('step %d user %d class %d' % (step, userID, f))
                        p[f][userID] += alpha * (eui * q[itemID][f] - lamda * p[f][userID])
                        Q [the itemID] [F] + = Alpha * (EUI * P [F] [userID] - Lamda * Q [the itemID] [F]) 
        Alpha * = 0.9 
    return P, Q 


DEF Recommend (Frame, userID, P, Q , the TopN = 10): 
    Series = pd.Series (predictList, index = otherItemList)
    '''
    推荐TopN个物品给目标用户
    : param frame: Source Data 
    : param userID: target user 
    : param p: user interests and implicit class relationship 
    : param q: relationship hidden class and articles 
    : param TopN: Number Recommended 
    : return: Recommended article 
    '' ' 
    userItemlist = List (SET (Frame [Frame [ 'the userID'] == userID] [ 'MovieID'])) 
    otherItemList = [Item for SET in Item (Frame [ 'MovieID']. values) in IF Item Not userItemlist] 
    predictList = [lfmPredict (P, Q, userID, the itemID) for the itemID in otherItemList] 
    Series = series.sort_values (Ascending = False) [: The TopN] 
    return Series 


IF the __name__ == '__main__': 
    Frame = the getResource ( 'ratings.csv') 
    P, latenFactorModel = Q (Frame,. 5, 10, 0.02, 0.01) 
    L = Recommend (Frame,. 1, P,g) 
Print (L)
Copy the code

Guess you like

Origin www.cnblogs.com/cmybky/p/11776379.html