转载:https://blog.csdn.net/taozhaojie/article/details/52790032
SVD算法python实现
之前看到一篇实现SVD算法的blog,但是实现方法没有用到矩阵。为了更直观简便高效的实现SVD算法,在这里基于numpy重新写了一遍。
原blog转载较多,已经找不到原作者了,参考以下地址:
http://blog.csdn.net/recsysml/article/details/12287513
这里用到的算法是优化下面这个目标函数:
代码如下:
参数: mat - 输入矩阵, feature - latent factor数量
- def svd(mat, feature, steps=500, gama=0.02, lamda=0.3):
- slowRate = 0.99
- preRmse = 1000000000.0
- nowRmse = 0.0
- user_feature = numpy.matrix(numpy.random.rand(mat.shape[0], feature))
- item_feature = numpy.matrix(numpy.random.rand(mat.shape[1], feature))
- for step in range(steps):
- rmse = 0.0
- n = 0
- for u in range(mat.shape[0]):
- for i in range(mat.shape[1]):
- if not numpy.isnan(mat[u,i]):
- pui = float(numpy.dot(user_feature[u,:], item_feature[i,:].T))
- eui = mat[u,i] - pui
- rmse += pow(eui, 2)
- n += 1
- for k in range(feature):
- user_feature[u,k] += gama*(eui*item_feature[i,k] - lamda*user_feature[u,k])
- item_feature[i,k] += gama*(eui*user_feature[u,k] - lamda*item_feature[i,k]) # 原blog这里有错误
- nowRmse = sqrt(rmse * 1.0 / n)
- print 'step: %d Rmse: %s' % ((step+1), nowRmse)
- if (nowRmse < preRmse):
- preRmse = nowRmse
- else:
- break # 这个退出条件其实还有点问题
- gama *= slowRate
- step += 1
- return user_feature, item_feature
为什么不能用Scipy和Scikit里面的SVD实现
python在scipy和scikit里面都有不同的svd实现方法,但是这些方法有一个共同的问题,就是没有考虑缺失值(missing value)的问题,即输入的原始矩阵不能有nan值。这时就需要去填入这些缺失值,不论是填0还是填均值,都会影响结果的准确性。