[Recommendation system] Basic algorithm of recommendation system-recommendation method based on matrix decomposition, latent semantic model

Table of contents

Matrix factorization

Singular Value Decomposition (SVD)

Application of SVD in recommendation systems

Load the user rating matrix and perform matrix decomposition

Matrix dimensionality reduction

Calculate similarity

Make predictions for items that have not been rated by users 

 Process summary:

latent semantic model


Matrix factorization

First we need to know the meaning of eigenvalues ​​and eigenvectors. The basic definitions are as follows:

Ax=λx

Matrix A is an nxn matrix, x is an n-dimensional vector, then i is an eigenvalue of matrix A, and y is the eigenvector corresponding to the eigenvalue i of matrix A.

The geometric meaning of the eigenvector is: the eigenvector x is only scaled through the square matrix A transformation, but the direction does not change.

If we can find n eigenvalues ​​of matrix A, we can get the diagonal matrix Σ, which is expanded into the following form:

\sum = \begin{bmatrix}\lambda 1 & 0& 0& ...& 0\\ 0& \lambda 2& 0& ...& ...\\ 0& 0& ...& ...& 0\\ ...& ...& ...& \lambda n-1& 0\\ 0& & 0& 0 & \lambda n\end{bmatrix}

Then the matrix A can be expressed by the eigendecomposition of the following formula:

A=U\sum U^{-1}

Where U is the nxn-dimensional matrix generated by these n eigenvectors, and Σ is the nxn-dimensional matrix with the main diagonal of these n eigenvalues.

Generally, we will standardize the n eigenvectors of U, that is, satisfyU^{-1}= U^{T}. At this time, the eigendecomposition expression of matrix A can be further written as:

A= U\sum^{'} U^{T}

) So if A is not a square matrix, that is, if the number of rows and columns is different, can we still decompose the matrix? The answer is yes. The most commonly used decomposition method is Singular Value Decomposition (SVD)

Singular Value Decomposition (SVD)

SVD also decomposes matrices, but unlike eigendecomposition, SVD does not require the matrix to be decomposed to be a square matrix. Assuming that our matrix A is an mxn matrix, then we define the SVD of matrix A as

A= U\sum V^{T}

Where U is an mx matrix, an mxn matrix, all 0 except for the elements on the main diagonal. Each element on the main diagonal is called a singular value, and V is an nxn matrix.

Application of SVD in recommendation systems

Assume that the dimension of matrix A is 10x11. The rows represent users, the columns represent items, and the values ​​represent users' ratings of items. When there is no rating, the score is 0. The matrix is ​​loaded into the variable myMat, and linalg.svd is called in Python. That is The three decomposed matrices U, Σ, and V can be obtained:

Load the user rating matrix and perform matrix decomposition

from numpy import* 

#定义用户评分矩阵A
def loadExData():

    return[ [0,0,1,0,0,2,0,0,0,0,5],

            [0,0,0,5,0,3,0,0,0,0,3],

            [0,0,0,0,4,1,0,1,0,4,0],

            [3,3,4,0,0,0,0,2,2,0,0],

            [5,4,2,0,0,0,0,5,5,0,0],

            [0,0,0,0,5,0,1,0,0,0,0],

            [4,1,4,0,0,0,0,4,5,0,1],

            [0,0,0,4,0,4,0,0,0,0,4],

            [0,0,0,2,0,2,5,0,0,1,2],

            [1,0,0,4,0,0,0,1,2,0,0]]

#加载用户对物品的评分矩阵
myMat=mat(loadExData())
#矩阵分解
U,Sigma,VT=linalg.svd(myMat)
print(Sigma)
'''
[14.2701248  11.19808631  7.13480024  5.13612006  4.68588496  3.09682859
  2.72917436  2.55571761  1.05782196  0.185364  ]'''

Sigma is a singular value vector. The singular values ​​can be sorted from large to small and the singular values ​​decrease particularly quickly. In many high-dimensional cases, the sum of the first 10% of the singular values ​​accounts for the sum of all singular values. A ratio of more than 80%. In this example, the choice of k is mainly based on the energy proportion of singular values. The original matrix energy value is sum(Sigma**2)=452. After reducing to k=3 dimensions, the energy value is sum(Sigma[0:3]**2 )=380, the energy proportion reaches 84.4%. In other words, we can use the largest k singular values ​​and the vectors corresponding to u and v to describe the matrix A.

A = A_{m*n} = U_{m*n}\sum _{m*n}V_{m*n}\approx U_{m*k}\sum _{k*k}V_{k*n}^{T}

Matrix dimensionality reduction

#这里取k = 4,利用矩阵分解将原来的评分矩阵降维
NewData = U[:,:4] * mat(eye(4)* Sigma[:4]) * VT[:4,:]
NewData

Compared the two matrices and found that the high scores are very close, indicating that the original matrix A can be represented by these three matrices(U_{m*k}\sum _{k*k}V_{k*n})^{T}.

Map the item rating matrixA^{T} into a low-dimensional spaceA^{T}U_{m*k}\sum_{ k*k}^{I}, where the dimension is reduced from nxm to nxk, and then calculate the similarity between items Degree, the dimension of each item is reduced from m to n, thereby improving calculation efficiency.

Generally speaking, m represents the number of users in the sample, and the dimension will be very high, while k<<m. This representation method greatly reduces the pressure of online storage and calculation.

Calculate similarity

#基于SVD的评分估计
##dataMat是输入矩阵
#simMeas是相似度计算函数
#user和item是待打分的用户和item对
#userData是输入矩阵dataMat的子集
#xformedItem是对原始评分矩阵进行降维后的物品评分矩阵

def svdEst(userData,xformedItems,user,simMeas,item):
    n = shape(xformedItems)[0]#用户数量
    simTotal = 0.0 #初始化相似度的总和
    ratSimTotal = 0.0 #初始化相似度及评分值的乘积(预测的评分)求和
    #对给定的用户,for 循环所有物品,计算与item相似度
    for j in range(n):
       #每个物品的不同用户的评分
        userRating = userData[:,j]
        if userRating == 0 or j == item:
            continue
            #计算物品间的相似度
        similarity = simMeas(xformedItems[item , :].T,xformedItems[j , :].T)
        print('the %d and %d similarity is :%f' %(item,j,similarity))
        #对相似度求和
        simTotal += similarity
        #对相似度及评分值的乘积求和
        ratSimTotal += similarity*userRating

    if simTotal == 0 : 
        return 0
    else:
        return ratSimTotal/simTotal
  

Make predictions for items that have not been rated by users 


#余弦相似度
def cosSim(U_k,W_t):
    num = float(U_k.T * W_t)
    denom = linalg.norm(U_k)* linalg.norm(W_t)
    return 0.5 + 0.5 * (num/denom)

# 寻找未评级的物品,对给定用户建立一个未评分的物品列表
def recommend(dataMat,user,N = 3,simMeas = cosSim,estMethod = svdEst):
    
    U,Sigma,VT  = linalg.svd(dataMat)
    #使用奇异值构建一个对角矩阵
    Sig4 = mat(eye(4) * Sigma[:4])
    #利用U矩阵将物品转换到低维空间中
    xformedItem = dataMat.T * U[:,:4] * Sig4.I
    print('xformedItem =',xformedItem)
    print('xformedItem维度:',shape(xformedItem))
    
    unratedItems = nonzero(dataMat[user:].A==0)[1]#未评分的物品
    print('dataMat[user:].A = ',dataMat[user:].A)
    print('nonzero(dataMat[user:].A==0)结果为',nonzero(dataMat[user:].A==0))
    #如果不存在未评分物品,退出函数,否则在所有未评分物品上进行循环
    if len(unratedItems) == 0:
        return ('you rated everyting')
    itemScores = []
    for item in unratedItems:
        print('items = ',item)
    #对于每个未评分物品,通过调用standEst() 来产生该物品基于相似度的预测评分
        estimatedScore = estMethod(dataMat[user,:],xformedItem,user,simMeas,item)
    #该物品的编号和估计得分值会放在一个元素列表itemScores
        itemScores.append((item,estimatedScore))
    #寻找前N个未评级物品
    return sorted(itemScores,key = lambda jj:jj[1],reverse  = True)[:N]

myMat = mat(loadExData())
result = recommend(myMat,1,estMethod = svdEst)
print(result)

 Process summary:

  1. Load the user's rating matrix for items
  2. Matrix decomposition, find singular values, and determine the value of dimensionality reduction to k according to the energy proportion of the singular values.
  3. Dimensionality reduction of item rating matrix using matrix factorization
  4. Use the dimensionally reduced item rating matrix to calculate item similarity and predict items that have not been rated by users.
  5. Generate the first n items with high ratings and return the item number and predicted rating value. 

latent semantic model

Before SVD calculation, the missing values ​​of the rating matrix A will be completed. After completion, the sparse matrix A will be expressed as a dense matrix, and then A will be decomposed into A'= A' = U\sum 'U^{T} ,This method has disadvantages:

  • It consumes a huge amount of storage space. In reality, there are tens of millions of user behavior information on items. It is unrealistic to store such a dense matrix:
  • The computational complexity of SVD is very high, let alone such a large-scale dense matrix.

Therefore, many studies on SVD are conducted on small data sets. The latent semantic model is also based on matrix decomposition, but unlike SVD, it decomposes the original matrix into two matrices multiplied instead of three.A=PQ^{T}

The problem now becomes determining P and Q. We call P the user factor matrix and Q the item factor matrix. Usually the above formulas cannot achieve exact equality. What we have to do is to minimize the gap between them, which becomes an optimization problem. The appropriate parameters in P and Q are found by optimizing the loss function, which acts as the rating of item j by user i.

min(\left \| r_{ij}-\sum_{i = 1}^{K} p_{ik}q_{ki}\right \|_{2}^{2}+ \lambda \left \| pi \right \|^{2} +\gamma \left \| qj \right \|^{2} )       ​ ​ Formula 1

The interaction data between users and items in the recommendation system is divided into explicit feedback data and implicit feedback data. The implicit model has an additional confidence parameter, which involves the processing method of the implicit feedback model in ALS - some articles are called "weighted regularized matrix decomposition", and its loss function is as follows:

min(c_{ij}\left \| r_{ij}-\sum_{i = 1}^{K} p_{ik}q_{ki}\right \|_{2}^{2}+ \lambda \left \| pi \right \|^{2} +\gamma \left \| qj \right \|^{2} )Formula 2

There is no rating in the implicit feedback model, so Tij in the formula is not a specific score, but only 1. It only indicates the interaction between the user and the item, but does not indicate the level of rating or degree of preference. There is also a Cij item in the function, which is used to express the user's confidence in favoring a certain product. For example, the weight of a product with a greater number of interactions will increase. If we use dij to represent the number of interactions, then the confidence level can be expressed as the following formula:

c_{ij} = 1+\alpha d_{ij}

In this way, collaborative filtering is successfully transformed into an optimization problem. In order to find the optimal solution to the above loss function, the most commonly used algorithm is the ALS algorithm, namely Alternating Least Squares. The basic calculation process of the algorithm is:

(1) Randomly initialize Q, find the partial derivative of Pi in Equation 1, set the derivative to 0, and obtain the current optimal solution.

p_{i} = (Q^{T}C^{i}Q +\lambda I)^{-1}Q^{T}C^{i}d_{i}

2) Fix p, find the partial derivative of qj in Equation 1, set the derivative to 0, and obtain the current optimal solution qj;

3) Fix q, find the partial derivative of pj in Equation 1, set the derivative to 0, and obtain the current optimal solution pj;

4) Loop 2) and 3) until the specified number of iterations or convergence. For large data sets, you can use spark to calculate ALS.

In practical problems, since the matrix to be decomposed is usually very sparse, compared with SVD, ALS can effectively solve the over-fitting problem. The scalability of the collaborative filtering algorithm based on ALS matrix decomposition is also better than SVD.

Guess you like

Origin blog.csdn.net/m0_51933492/article/details/126647931