Machine Learning Note 7: matrix factorization Recommender.Matrix.Factorization

1 matrix decomposition Overview

1.1 used in any place

Recommended system: it is the most famous beer and diapers that bad street stories, there is now recommended headlines feeding system users. Not to say.

1.2 Recommended principles

Set, matrix R represents 3 4 Rating of film, and P is the matrix U by the decomposition of the matrix arithmetic, R is predicted matrix.
At this point we can see that the matrix R
value is very close to the original matrix R value, the value is the numbers we want to fill after this.

2 matrix decomposition principle

2.1 The objective function

As shown in 1.2, it is the result of our desired result R * and R results in the minimum difference.
Therefore, we can obtain the objective function:
\ [Arg \ min_ {the U-, P} \ SUM _ {(I, J) \ in the Z} (R_ {ij of} -U_ {I} of P_ {J} ^ {T}) ^ 2 \\ \\ Z = \ {(i , j): r_ {ij} are known \} \]

\ (U_i P_j \) is a row vector, respectively, from the i-th row of matrix P and the matrix U and the j th row; illustration representing the vector of the i th user, and the j-th vector illustration article.

2.2 loss function

To facilitate the derivation, we multiply 1/2, results are as follows:
\ [Arg \ {the U-min_, P} \ _ {SUM (I, J) \} in the Z \ FRAC. 1} {2} {(ij of R_ {} -U_ {i} \ cdot P_ { j}) ^ 2 \\ \\ Z = \ {(i, j): r_ {ij} are known \} \]
continues to calculate the results are as follows:
\ [ij of L_ {} = \ frac {1} {2} (R_ {ij} -U_ {i} \ cdot P_ {j}) ^ 2 \\ \]

Loss resulting gradient was as follows:

\[ \frac{\partial L_{ij}}{\partial U_{i}}= \frac{\partial }{\partial U_{i}} [\frac{1}{2}(R_{ij}-U_{i}\cdot P_{j})^2] = -P_j(R_{ij}-U_{i}\cdot P_{j}) \\ \\ \frac{\partial L_{ij}}{\partial P_{j}}= \frac{\partial }{\partial P_{j}} [\frac{1}{2}(R_{ij}-U_{i}\cdot P_{j})^2] = -U_i(R_{ij}-U_{i}\cdot P_{j}) \]

In order to prevent over-fitting and the error in the training process, the regularization term added

\[ arg \min_{U,P} \sum_{(i,j) \in Z}\frac{1}{2}(R_{ij}-U_{i}\cdot P_{j})^2 + \lambda [ \sum_{i=1}^{m}\left \| U_i \right \|^2 + \sum_{i=1}^{n}\left \| P_j \right \|^2] \]

再求偏导可得:
\[ \frac{\partial L_{ij}}{\partial U_{i}}=-P_{j}(R_{ij}-U_{i}\cdot P_{j}) + \lambda U_{i} \\ \\ \frac{\partial L_{ij}}{\partial P_{j}}=-U_{i}(R_{ij}-U_{i}\cdot P_{j}) + \lambda P_{j} \\ \]

2.3 Results obtained by the method of gradient descent

Setting the value of k is set learning step \ (\ Gamma \) (Learning Rate), U and P initialization, repeating the following steps until the mean square error are satisfied:
traverse in Z (i, j), Z = {( I, J): \ (ij of R_ {} \) known}
\ [U_upper {I} \ {I} LeftArrow U_upper - \ Gamma \ FRAC {\ ij of partial L_ {} {} \ partial U_upper {I}} \ \ P_ {j} \ leftarrow P_ {j} - \ gamma \ frac {\ partial L_ {ij}} {\ partial P_ {j}} \\ \]

3 code implementation

Read the above formula is definitely sketchy, but looked matrix decomposition function, you will suddenly see the light of gradient descent method to solve the problem
of code:

# 导入 nunpy 和 surprise 辅助库
import numpy as np
import surprise  

# 计算模型
class MatrixFactorization(surprise.AlgoBase):
    '''基于矩阵分解的推荐.'''
    
    def __init__(self, learning_rate, n_epochs, n_factors, lmd):
        
        self.lr = learning_rate  # 梯度下降法的学习率
        self.n_epochs = n_epochs  # 梯度下降法的迭代次数
        self.n_factors = n_factors  # 分解的矩阵的秩(rank)
        self.lmd = lmd # 防止过拟合的正则化的强度
        
    def fit(self, trainset):
        '''通过梯度下降法训练, 得到所有 u_i 和 p_j 的值'''
        
        print('Fitting data with SGD...')
        
        # 随机初始化 user 和 item 矩阵.
        u = np.random.normal(0, .1, (trainset.n_users, self.n_factors))
        p = np.random.normal(0, .1, (trainset.n_items, self.n_factors))
        
        # 梯度下降法
        for _ in range(self.n_epochs):
            for i, j, r_ij in trainset.all_ratings():
                err = r_ij - np.dot(u[i], p[j])
                # 利用梯度调整 u_i 和 p_j
                u[i] -= -self.lr * err * p[j] + self.lr * self.lmd * u[i]
                p[j] -= -self.lr * err * u[i] + self.lr * self.lmd * p[j]
                # 注意: 修正 p_j 时, 按照严格定义, 我们应该使用 u_i 修正之前的值, 但是实际上差别微乎其微
        
        self.u, self.p = u, p
        self.trainset = trainset

    def estimate(self, i, j):
        '''预测 user i 对 item j 的评分.'''
        
        # 如果用户 i 和物品 j 是已知的值, 返回 u_i 和 p_j 的点积
        # 否则使用全局平均评分rating值(cold start 冷启动问题)
        if self.trainset.knows_user(i) and self.trainset.knows_item(j):
            return np.dot(self.u[i], self.p[j])
        else:
            return self.trainset.global_mean
            
# 应用
from surprise import BaselineOnly
from surprise import Dataset
from surprise import Reader
from surprise import accuracy
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split
import os

# 数据文件
file_path = os.path.expanduser('./ml-100k/u.data')

# 数据文件的格式如下:
# 'user item rating timestamp', 使用制表符 '\t' 分割, rating值在1-5之间.
reader = Reader(line_format='user item rating timestamp', sep='\t', rating_scale=(1, 5))
data = Dataset.load_from_file(file_path, reader=reader)

# 将数据随机分为训练和测试数据集
trainset, testset = train_test_split(data, test_size=.25)

# 初始化以上定义的矩阵分解类.
algo = MatrixFactorization(learning_rate=.005, n_epochs=60, n_factors=2, lmd = 0.2)

# 训练
algo.fit(trainset)

# 预测
predictions = algo.test(testset)

# 计算平均绝对误差
accuracy.mae(predictions)

#结果:0.7871327139440717

# 使用 surpise 内建的基于最近邻的方法做比较
algo = surprise.KNNBasic()
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.mae(predictions)

#结果:0.7827160139309475

# 使用 surpise 内建的基于 SVD 的方法做比较
algo = surprise.SVD()
algo.fit(trainset)
predictions = algo.test(testset)
accuracy.mae(predictions)

#结果:0.7450633876817936

Guess you like

Origin www.cnblogs.com/bugutian/p/11288673.html