Recommendation System Practice

Recommendation system evaluation

 experimental method
  • Offline experiment: prepare training data and test data and evaluate;
  • User Survey: Questionnaire Method and User Satisfaction Survey
  • Online experiment: AB test: AB test is a very commonly used experimental method for online evaluation algorithms. It randomly divides users into several groups through certain rules,
  • Use different algorithms for different groups of users, and then compare different calculations by counting various evaluation indicators of different groups of users.
  • For example, the click-through rate of different groups of users can be counted, and the performance of different algorithms can be compared through the click-through rate.
Evaluation indicators:
  • customer satisfaction
  • Prediction accuracy: Prediction accuracy measures the ability of a recommender system or recommendation algorithm to predict user behavior. This indicator is the most important offline evaluation indicator for recommender systems.
  • Rating Prediction: Root Mean Square Error RMSE Mean Absolute Error MAE
  • TopN recommendation: When a website provides recommendation services, it generally gives users a personalized recommendation list. This recommendation is called TopN recommendation. The prediction accuracy recommended by TopN is generally measured by precision/recall.
  • Coverage: Coverage describes the ability of a recommender system to discover the long tail of items. There are different ways to define coverage. The simplest definition is the proportion of items that the recommender system can recommend to the total item set.
  • Information entropy measure: Here p(i) is the popularity of item i divided by the sum of the popularity of all items.
  •  

  • Gini coefficient measurement: ij is the jth item in the item list sorted by item popularity p() from small to large
  • Diversity: The recommendation list needs to be able to cover different areas of interest of users, that is, the recommendation results need to be diverse.
  • Novelty: Novel recommendations refer to recommending items to users that they have not heard of before.
  • Surprise degree: If the recommendation result is not similar to the user's historical interest, but it makes the user feel satisfied, then it can be said that the recommendation result has a high degree of surprise
  • Trust: It is very important for users to trust the recommendation results. For the same recommendation result, recommending to users in a way that is trusted by users will more likely make users want to buy, while recommending to users in a similar way as an advertisement may be difficult to make users want to buy.
  • real-time
  • Robustness: The robustness (ie robust, robustness) indicator measures the ability of a recommender system to resist cheating.
  • Business goals: The website evaluation and recommendation system pays more attention to whether the business goals of the website are achieved, and the business goals and the profit model of the website are closely related.
Evaluation dimension:
  • The user dimension mainly includes the user's demographic information, activity, and whether it is a new user.
  • The item dimension includes attribute information, popularity, average score, and whether it is a newly added item.
  • The time dimension includes seasons, weekdays or weekends, day or night, etc.

 User behavior data

  • User Explicit Feedback Data: Explicit feedback behaviors include behaviors in which users express their preferences for items. The main way is to rate and like/dislike.
  • User Implicit Feedback Data: Implicit feedback behaviors refer to behaviors that do not explicitly reflect user preferences. The most representative implicit feedback behavior is page browsing behavior.
 
User Activity and Item Popularity: Long-tailed Distributions
        Generally speaking, inactive users are either new users or old users who have only been to the site once or twice. So, are there differences in the popularity of items that users with different levels of activity like? It is generally believed that new users tend to browse popular items because they are unfamiliar with the website and can only click on popular items on the homepage, while old users will gradually begin to browse unpopular items.
 
Recommendation Algorithms for User Behavior Data Design
  • Neighborhood-based methods
  • latent factor model
  • Graph-based random walk algorithm (random walk on graph)
The most important method is the neighborhood-based algorithm:
  • User-Based Collaborative Filtering Algorithm This algorithm recommends items that other users like with similar interests to the user.
  • Item-Based Collaborative Filtering Algorithms This algorithm recommends items that are similar to the items he liked before.
 
 
User-based collaborative filtering algorithm:
  • (1) Find a set of users with similar interests to the target user. (similarity calculation, based on distance)
  • (2) Find the items in this collection that users like and that the target user has not heard of and recommend to the target user
Improvement 1:
        If the similarity is calculated for each user, the complexity is too high O(n*n), so you can first calculate the item-user's inverted list, and then calculate the user similarity, so as to filter out the users whose common item is 0
 
Improvement 2:
        Consider the popularity of the item itself,
 

 

 
 
Item-based collaborative filtering algorithm
  • (1) Calculate the similarity between items.
  • (2) Generate a recommendation list for the user according to the similarity of the items and the user's historical behavior.
  • probability of co-occurrence
 
Improvement 1: user-item inverted list 
 
Improvement 2 Impact of user activity
 
 

 

 

latent semantic model

        隐语义模型是最近几年推荐系统领域最为热门的研究话题,它的核心思想是通过隐含特征(latent factor)联系用户兴趣和物品。
        隐含语义分析技术从诞生到今天产生了很多著名的模型和方法,其中和该技术相关且耳熟能详的名词有pLSA、LDA、隐含类别模型(latent class model)、隐含主题模型(latent topic model)、矩阵分解(matrix  factorization)。这些技术和方法在本质上是相通的,其中很多方法都可以用于个性化推荐系统。
 
        隐性反馈数据中,没有负样本 只有正样本;在隐性反馈数据集上应用LFM解决TopN推荐的第一个关键问题就是如何给每个用户生成负样本。
  • 对于一个用户,用他所有没有过行为的物品作为负样本。
  • 对于一个用户,从他没有过行为的物品中均匀采样出一些物品作为负样本。
  • 对于一个用户,从他没有过行为的物品中采样出一些物品作为负样本,但采样时,保证每个用户的正负样本数目相当。
  • 对于一个用户,从他没有过行为的物品中采样出一些物品作为负样本,但采样时,偏重采样不热门的物品。
        对于第一种方法,它的明显缺点是负样本太多,正负样本数目相差悬殊,因而计算复杂度很高,最终结果的精度也很差。对于另外3种方法,Rong  Pan在文章中表示第三种好于第二种,而第二种好于第四种。
  • 对每个用户,要保证正负样本的平衡(数目相似)。
  • 对每个用户采样负样本时,要选取那些很热门,而用户却没有行为的物品。
LFM和基于邻域的方法的比较
  • 理论基础  LFM具有比较好的理论基础,它是一种学习方法,通过优化一个设定的指标建立最优的模型。基于邻域的方法更多的是一种基于统计的方法,并没有学习过程。
  • 离线计算的空间复杂度 :LFM大量节省了训练过程中的内存
  • 离线计算的时间复杂度:在一般情况下,LFM的时间复杂度要稍微高于UserCF和ItemCF,这主要是因为该算法需要多次迭代。但总体上,这两种算法在时间复杂度上没有质的差别。
  • 在线实时推荐:  UserCF和ItemCF在线服务算法需要将相关表缓存在内存中,然后可以在线进行实时的预测。
  • 推荐解释:  ItemCF算法支持很好的推荐解释,它可以利用用户的历史行为解释推荐结果。但LFM无法提供这样的解释,它计算出的隐类虽然在语义上确实代表了一类兴趣和物品,却很难用自然语言描述并生成解释展现给用户。
 

推荐系统冷启动问题

  • 用户冷启动
  • 物品冷启动
  • 系统冷启动
 
利用用户注册信息
  • 人口统计学信息  包括用户的年龄、性别、职业、民族、学历和居住地。
  • 用户兴趣的描述  有一些网站会让用户用文字描述他们的兴趣。
  • 从其他网站导入的用户站外行为数据  比如用户通过豆瓣、新浪微博的账号登录,就可以在得到用户同意的情况下获取用户在豆瓣或者新浪微博的一些行为数据和社交网络数据。
选择合适的物品启动用户的兴趣
        一般来说,能够用来启动用户兴趣的物品需要具有以下特点
  • 比较热门
  • 具有代表性和区分性
  • 启动物品集合需要有多样性
 
利用物品的内容信息
发挥专家的作用
 

利用用户标签数据

        用户用标签来描述对物品的看法,因此标签是联系用户和物品的纽带,也是反应用户兴趣的重要数据源,如何利用用户的标签数据提高个性化推荐结果的质量是推荐系统研究的重要课题
 

 

数据稀疏性
 
        对于新的物品或者用户,标签数量非常少,此时需要对标签进行扩展 ---基于标签的相似性
 
标签清理:除去词频很高的停止词、同义词  等;类比自然语言处理 
 
 

## 利用上下文信息

时间上下文信息
  • 用户兴趣是变化的
  • 物品也是有生命周期的
  • 季节效应
 
系统时间特性的分析:
  • 数据集每天独立用户数的增长情况
  • 系统的物品变化情况 :网站新闻增长情况,商品增长情况等
  • 用户访问情况: 用户的平均活跃天数
 
时间上下文推荐算法:
 
  • 最近热门推荐
  • 时间上下文itemCF算法: 最常用的是基于物品的个性化推荐系统
    • 物品相似度
    • 在线推荐 : 用户近期行为更关键
  • 时间上下文相关的userCF算法
    • 用户兴趣相似度
    • 相似兴趣用户最近行为
  • 地点上下文
   
 
 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325378867&siteId=291194637