推荐系统笔记一:overview

Motivated by Dr. Wu, briefly summarize the paper for future reference.

This overview is based on my understanding of the paper: Zeynep Batmaz, Ali Yurekli, Alper Bilge, Cihan Kaleli, A review on deep learning for recommender systems: challenges and remedies, 2018.

Why we need RS?
Solve information overload problem

Classification:

  • Collaborative filtering RS

    • Memory-based
      Utilize the entire user-item matrix to identify similar entities. After locating the nearest neighbors, past ratings of these entities are employed for recommendation purposes.
      User-based: Employ past preferences of nearest neighbors to user a
      Item-based: Employ the ratings of similar items to item q

    • Model-based
      Aim to build an offline model by applying machine learning and data mining techniques. Building and training such model allows estimating predictions for online CF tasks.

  • Content-based RS
    The main purpose is to recommend items that are similar to those that a user liked in the past. For instance, if a user likes a website that contains keywords such as “stack”, “queue”, and “sorting”, a content-based recommender system would suggest pages related with data structures and algorithms.

  • Hybrid RS

The main difference between collaborative filtering and content-based is that CF relies on the past history of user behavior, i.e. user and item rating while content-based relies on item or user attribute, i.e. content distribution.

Challenges and solutions

  • Accuracy: usually judged by three ways, the accuracy of rating predictions, usage predictions, and ranking of items.
    Solution: use ML model to extract hidden features and jointly combine information from varying sources.
  • Sparsity or Cold-start: lack of data, i.e. user ratings or new user information
    Solution: use ML model to extract high dimensional and denser feature representation/ use ML model to extract features from heterogeneous data sources/ combine content-based RS for cold-start problem.
  • Scalability: balance between model complexity and respond time.
    Solution: use ML model to extract high dimensional data, i.e. less dimensions/ modify ML model to accelerate training process/ parallel computing

The accuracy in CF system is not simply equal to the prediction accuracy as normal machine learning tasks. A good model should give both related items and thrilling items which might attract users, i.e. it should balance exploration and exploitation.

猜你喜欢

转载自blog.csdn.net/thormas1996/article/details/88819457
今日推荐