96 R Examples of Recommendation Algorithms

R language: summary and application case of recommenderlab package

1 Introduction to the framework package

  1. Recommendation system: The recommenderlab package general idea
    The recommenderlab package provides a framework for developing and testing recommendation algorithms with rating data and 0-1 data.
    It provides several basic algorithms, and can use the registration mechanism to allow users to use their own algorithms
    . The data type of the recommender package is constructed using the S4 class.

(1) Scoring matrix data interface: Use the abstract raisingMatrix to provide an interface for scoring data.
RaringMatrix uses many operations similar to matrix objects, such as dim(), dimnames(), rowCounts(), colMeans(), rowMeans(), colSums(), rowMeans();
it also adds some special operation methods, such as sample (), for sampling from users (i.e. rows), image() to generate pixmaps.
The two specific applications of raringMatrix are realRatingMatrix and binaryRatingMatrix, which correspond to different situations of the rating matrix.

其中realRatingMatrix使用的是真实值的评分矩阵,存储在由Matrix包定义的稀疏矩阵(spare matrix)格式中;
 binaryRatingMatrix使用的是0-1评分矩阵,存储在由arule包定义的itemMatrix

(2) Store the recommendation model and make recommendations based on the model. The class Recommender uses a data structure to store recommendation models.

 创建方法是:Rencommender(data=ratingMatrix,method,parameter=NULL),返回一个Rencommender对象object,可以用来做top-N推荐的预测:
 predict(object,newdata,n,type=c('topNlist,ratings'),…)

(3) Users can use the registration mechanism provided by the registry package to customize their own recommendation algorithms.
The registration mechanism calls recommenderRegistry and stores the name and short description of the recommendation algorithm.

(4) Evaluate the performance of the recommendation algorithm: The recommender package provides objects of the evaluationScheme class for creating and saving evaluation plans.
The creation function is as follows: evaluatiomScheme(data,method,train,k,given) The method here can be simple division, bootstrap sampling, k-fold cross-validation, etc.
Next you can use the function evalute() to evaluate the performance of multiple evaluation algorithms using the evaluation plan.

Case Analysis

2 Data preview

library(recommenderlab)
library(ggplot2)
library(reshape2)
#install.packages("reshape")
library(reshape)
Sys.setlocale(category="LC_ALL",locale="Chinese" ) #After
setting the project path, you can read it Enter the data, pay attention to the format of the data, the first column is the user id, the second column is the item id, the third column is the rating, and the fourth column is the timestamp.
#Timestamp is not used here and can be removed.

insert image description here

ml100k <- read.table(“u.data”,header=FALSE, stringsAsFactors = T)
head(ml100k)
ml100k <- ml100k[, -4]

You can simply look at the distribution of ratings

prop.table(table(ml100k[, 3]))
summary(ml100k[, 3])
ml100k V 3 < − a s . n u m e r i c ( m l 100 k V3 <- as.numeric(ml100k V 3<as.numeric(ml100kV3)
summary(ml100k[, 3])

insert image description here

3 Data processing transformation

as(ml.ratingMatrix , "matrix")[1:3, 1:10]
as(ml.ratingMatrix , "list")[[1]][1:10]

In addition, the recommenderlab package provides a function normalize for normalization. The default is the mean normalization x – mean,
the function for establishing the recommendation model, which has normalization processing, and it is not necessary to perform normalization separately here.

Before modeling, you can take a look at realRatingMatrix, recommenderlab provides those recommendation techniques, there are 6 types in total,
we will use three of them random (recommended based on popularity), ubcf (recommended based on popularity), ibcf (based on project collaboration) filter)

Take
IBCF as an example to briefly introduce the meaning of parameters –mean normalize_sim_matrix: whether to normalize the similarity matrix, the default is no alpha: the alpha parameter value, the default is 0.5 na_as_zero : whether to use NA as 0, the default is no The default parameters are used in this article.






Build a recommendation model

#recommender is a function used to build a model in the recommenderlab package. The usage is quite simple.
#Note that all columns of the matrix need to be named according to itemlabels before calling recommender.

colnames(ml.ratingMatrix) <- paste(“M”, 1:1682, sep = “”)
as(ml.ratingMatrix[1,1:10], “list”)
insert image description here

4 Building models and predictions

##【Warning】在建立推荐模型之前一定要给item按照itemLabels进行命名,否则会有如下错误
##Error in validObject(.Object) :
##  invalid class “topNList” object: invalid object for slot "itemLabels" in class "topNList": got class "NULL", should be or extend class "character"
ml.recommModel <- Recommender(ml.ratingMatrix[1:30], method = "IBCF")
ml.recommModel
#Recommender of type ‘POPULAR’ for ‘realRatingMatrix’
#learned using 800 users.
#模型建立以后,可以用来进行预测和推荐了,同样使用predict函数,这里分别给801-803三个用户进行推荐,
#predict函数有一个type参数,可用来设置是top-n推荐还是评分预测,默认是top-n推荐。


##TopN推荐,n = 5 表示Top5推荐
ml.predict1 <- predict(ml.recommModel, ml.ratingMatrix[801:803], n = 5)
ml.predict1

as( ml.predict1, "list")  ##显示三个用户的Top3推荐列表

insert image description here

5 Model Evaluation

##用户对item的评分预测
 ml.predict2 <- predict(ml.recommModel, ml.ratingMatrix[801:803], type = "ratings")
 ml.predict2
 as(ml.predict2, "matrix")[1:3, 1:6]   ##查看三个用于对M1-6的预测评分

#模型的评估
#本文只考虑评分预测模型的评估,对于Top-N推荐模型请查看后面的参考资料,对于评分预测模型的评估,最经典的参数是RMSE(均平方根误差)


rmse <- function(actuals, predicts)
{
  sqrt(mean((actuals - predicts)^2, na.rm = T))
}
#幸运的是,recommenderlab包有提供专门的评估方案,对应的函数是evaluationScheme,
#能够设置采用n-fold交叉验证还是简单的training/train分开验证,本文采用后一种方法,
#即将数据集简单分为training和test,在training训练模型,然后在test上评估。


model.eval <- evaluationScheme(ml.ratingMatrix[1:943], method = "split", train = 0.9, given = 15, goodRating = 5)
model.eval
##分别用RANDOM、UBCF、IBCF建立预测模型
model.random <- Recommender(getData(model.eval, "train"), method = "RANDOM")
model.ubcf <- Recommender(getData(model.eval, "train"), method = "UBCF")
model.ibcf <- Recommender(getData(model.eval, "train"), method = "IBCF")
##分别根据每个模型预测评分
predict.random <- predict(model.random, getData(model.eval, "known"), type = "ratings")
predict.ubcf <- predict(model.ubcf, getData(model.eval, "known"), type = "ratings")
predict.ibcf <- predict(model.ibcf, getData(model.eval, "known"), type = "ratings")
#这里简单介绍,数据集是如何划分的,其实很简单,对于用户没有评分过的items,是没法进行模型评估的,因为预测值没有参照对象。
#getData的参数given便是来设置用于预测的items数量。

#接下来计算RMSE,对比三个模型的评估参数,calcPredictionError函数可以计算出MAE(绝对值均方误差)、MSE和RMSE。


error <- rbind(
  calcPredictionAccuracy(predict.random, getData(model.eval, "unknown")),
  calcPredictionAccuracy(predict.ubcf, getData(model.eval, "unknown")),
  calcPredictionAccuracy(predict.ibcf, getData(model.eval, "unknown")))
rownames(error) <- c("RANDOM", "UBCF", "IBCF")
error

insert image description here

6 SVD Comparison

SVD comparison

model.svd <- Recommender(getData(model.eval, “train”), method = “SVD”)
predict.svd <- predict(model.svd, getData(model.eval, “known”), type = “ratings”)
calcPredictionAccuracy(predict.svd, getData(model.eval, “unknown”))

Comparison of recommended results

##Build models for 1-100 users respectively

ml.recommModel_IBCF <- Recommender(ml.ratingMatrix[1:100], method = “IBCF”)
ml.recommModel_UBCF <- Recommender(ml.ratingMatrix[1:100], method = “UBCF”)
ml.recommModel_svd <- Recommender(ml.ratingMatrix[1:100], method = “SVD”)

Make predictions for 801-803

predict_IBCF <- predict(ml.recommModel_IBCF,ml.ratingMatrix[801:803], n = 5)
predict_UBCF <- predict(ml.recommModel_UBCF,ml.ratingMatrix[801:803], n = 5)
predict_svd <- predict(ml.recommModel_svd,ml.ratingMatrix[801:803], n = 5)

as(predict_IBCF,“list”)
as(predict_UBCF,“list”)
as(predict_svd,“list”)

evaluate result

evl.IBCF <- evaluate(model.eval,method=“IBCF”,n=c(1,3,5,10,15,20))
evl.IBCF

getConfusionMatrix(evl.IBCF)

model.eval1 <- evaluationScheme(ml.ratingMatrix, method=“cross”,k=4,given=15,goodRating=5 )
evl.IBCF <- evaluate(model.eval1,method=“IBCF”,n=c(1,3,5,10,15,20))

getConfusionMatrix(evl.IBCF)
avg(evl.IBCF)

plot(evl.IBCF,annotate=TRUE)
plot(evl.IBCF,“prec/rec”,annotate=TRUE)

insert image description here

aggregate

together <- list(
“Random” = list(name=“RANDOM”,param=NULL),
“IBCF” = list(name=“IBCF”),
“UBCF” = list(name=“UBCF”),
“SVD” = list(name=“SVD”)

)

result <- evaluate(model.eval,together,n=c(1,3,5,10,15,20))

plot(result,annotate=TRUE,legend=“topleft”)
plot(result,“prec/rec”,annotate=TRUE,legend=“topleft”)

insert image description here

Guess you like

Origin blog.csdn.net/weixin_44498127/article/details/124444452