Case | evaluation index system recommended

Recommended system can provide a personalized experience for users, and now basically the major electricity supplier platform, information platform will use the user recommendation system in its own evaluation of services provided thousands of thousands face is. The average accuracy of the mean (Mean Average Precision, MAP) is one of the recommended metric to evaluate system performance.

However, the use of other diagnostic indicators and visualization tools can make the model more in-depth assessment, or even bring some other inspiration. This paper discusses the recall rate, reach, personalization and similarities in the table, and use these indicators to compare three simple recommendation system.

Movielens data set

The examples in this article use the data Movielens 20m datasets. These data include the type of movie scores and user mark on the film. (In order to extend the training time, the data is sampled at, including the only score to over 1000 movies playing too much of the user's score, and the score of 3 stars and above.)

Examples of user movie ratings

model

This paper tested and compared three different recommendation systems:

1. Random Company (random for each user recommended 10 films)

The popularity of recommendation (recommended top 10 movies per user)

3. The collaborative filter (matrix factorization using the SVD)

Next, let us understand these indicators and diagnostic plots, and compare these models!

Figure Nagao

Nagao map for mining user - the popularity of interactive data entry mode, such as clicks, ratings, or buying behavior. Typically, only a small part of the project has a lot of interaction, which we call the "head"; but most of the projects are concentrated in the "long tail" in only a small part of their interaction.
FIG Nagao (Movielens 20m ratings data samples)

In the training data will observe many popular items in many ways, therefore, recommended the system you want to accurately predict these projects is not difficult. In the movie data set, the most popular movies are blockbusters and classic old movies. These films have been known for most users, we recommend these films, the user, may not be personalized recommendations, may not be able to help users find other new movie. Recommended related recommendation is defined as the positive comments at the time of the test data item. Here indicators used to assess the relevance and usefulness of the recommendation system.

MAP and MAR

Recommended system generates an ordered list of recommended test set for each user. Mean average precision (MAP) allows developers in-depth understanding of the relevant list of recommended items, and the recall rate allows developers to debug insight into the performance of the recommendation system, such as debugging a user to give a positive evaluation of all projects. MAR MAP and the detailed description is as follows:

Mean Average Precision (MAP) For Recommender Systems

覆盖率

覆盖率是指模型能够在测试集上推荐的项目占训练数据的百分比。在此示例中,受欢迎度推荐的覆盖率仅为0.05%,它只推荐了10件物品。随机推荐器的覆盖率接近100%。出乎意料的是,协同过滤只能推荐其训练的项目的8.42%。

三个推荐系统的覆盖率比较:

Here Insert Picture Description

个性化

个性化是评估模型是否向不同用户推荐相同项目的方法。用户的推荐列表之间存在差异(1-余弦相似性)。下边的例子能很好地说明如何计算个性化程度。

3个不同用户的推荐项目示例列表:
Here Insert Picture Description

首先,每个用户的推荐项目会被表示为二进制指示符变量(1:向用户推荐该项目.0:不向用户推荐该项目)。
Here Insert Picture Description

然后,跨所有用户的推荐向量计算余弦相似度矩阵。
Here Insert Picture Description

最后,计算余弦矩阵的上三角的平均值。个性化是1-平均余弦相似度。

Here Insert Picture Description
高个性化分数表示用户的推荐不同,这也意味着该模型为每一位用户提供个性化体验。

列表内相似性

列表内相似性是推荐列表中所有项目的平均余弦相似度。该计算使用推荐项目(例如电影类型)的特征来计算相似度。该计算方法可以通过以下示例说明。

针对3个不同用户的电影ID的推荐示例:

Here Insert Picture Description
这些电影类型特征用于计算推荐给用户的所有项目之间的余弦相似度。该矩阵显示了向用户1推荐的所有电影的特征。
Here Insert Picture Description

我们可以为每个用户计算表内相似性,并对测试集中的所有用户求平均值,从而得到对模型的表内相似性的估计。

Here Insert Picture Description
如果推荐系统向每一个用户推荐非常相似的项目列表(如用户仅接收浪漫电影的推荐),那么列表内相似性将很高。

使用正确的训练数据

我们可以对训练数据进行如下操作,从而快速改进推荐系统:

1.从培训数据中删除热门项目 (这一点适用于用户可以自行找到这些项目,以及发现这些项目不具备实用性的情况)。

2.按照用户的值来放大项目评级,例如平均交易值。这样做有助于模型推荐能够带来忠诚度或高价值客户的项目。

结论

一个好的推荐系统能够生成兼具实用性和相关性的推荐结果。

Using multiple assessment indicators to assess the model, a more comprehensive measure of the performance of a recommendation system.

Original link: Evaluation Metrics for Recommender Systems please add a link description

Above the fourth paradigm recommended first compiled, only to learn the exchange, belongs to original author.

Guess you like

Origin blog.51cto.com/13945147/2435372