Movie Recommendation System] Preliminary system construction and offline personalized recommendation

In the last blog post, we have finished the statistical recommendation part. Now we will use Vue+Element-ui+SpringBoot to quickly build the system, display movies, and introduce the personalized recommendation part.

1 System page design

Initially, I wanted to design a movie recommendation system similar to Douban

After logging in, users can view high-score movies
Can view recommended movies
can be rated

1.1 Front-end template download

Due to time reasons, I chose a template imitating the Douban movie system here . The original intention is not to exercise the ability of Vue, but how to do it as simple as possible.
Now we modify the system and use Element-ui for rapid development.

1.2 Back-end system construction

Use SpringBoot for rapid development
Add the relevant dependencies of MongoDB, and write the interface to test whether the data acquisition is successful
After the test is successful, Vue writes axios related code

Note: Be sure to pay attention to the version issue, it will be very sad to report an error...


data:
    mongodb:
      host: 服务器IP
      port: 27017
      database: recommender
      username: "root"
      password: "123456"

2. Collaborative filtering algorithm based on latent semantic model

Recommendation algorithms based on user behavior analysis are generally called collaborative filtering algorithms. The so-called collaborative filtering means that many users can work together, through continuous interaction with the website, so that their recommendation list can continuously filter out items that they are not interested in, so as to meet their needs more and more. Common implementation methods include:

Neighborhood-based methods
latent semantic model
Graph-Based Random Walk Algorithm

We use Latent Semantic Model (LFM), whose core idea is to complete the recommendation task by mining latent factors. We will improve this in the future.

The main steps:

Cartesian product of UserId and MovieID to generate a tuple of (uid, mid)
A tuple of (uid, mid) predicted by the model.
Sort the prediction results by prediction score.
Return the K movies with the highest scores as recommendations for the current user.
The movie similarity is calculated by ALS and stored in the MongoDB database, which is prepared for later real-time recommendation

// 核心程序
// 从rating数据中提取所有的uid和mid，并去重
val userRDD = ratingRDD.map(_._1).distinct()
val movieRDD = ratingRDD.map(_._2).distinct()

// 训练隐语义模型
val trainData = ratingRDD.map( x => Rating(x._1, x._2, x._3) )

val (rank, iterations, lambda) = (200, 5, 0.1)
val model = ALS.train(trainData, rank, iterations, lambda)

// 基于用户和电影的隐特征，计算预测评分，得到用户的推荐列表
// 计算user和movie的笛卡尔积，得到一个空评分矩阵
val userMovies = userRDD.cartesian(movieRDD)

// 调用model的predict方法预测评分
val preRatings = model.predict(userMovies)

val userRecs = preRatings
  .filter(_.rating > 0)    // 过滤出评分大于0的项
  .map(rating => ( rating.user, (rating.product, rating.rating) ) )
  .groupByKey()
  .map{
    
    
    case (uid, recs) => UserRecs( uid, recs.toList.sortWith(_._2>_._2).take(USER_MAX_RECOMMENDATION).map(x=>Recommendation(x._1, x._2)) )
  }
  .toDF()

userRecs.write
  .option("uri", mongoConfig.uri)
  .option("collection", USER_RECS)
  .mode("overwrite")
  .format("com.mongodb.spark.sql")
  .save()

// 基于电影隐特征，计算相似度矩阵，得到电影的相似度列表
val movieFeatures = model.productFeatures.map{
    
    
  case (mid, features) => (mid, new DoubleMatrix(features))
}

// 对所有电影两两计算它们的相似度，先做笛卡尔积
val movieRecs = movieFeatures.cartesian(movieFeatures)
  .filter{
    
    
    // 把自己跟自己的配对过滤掉
    case (a, b) => a._1 != b._1
  }
  .map{
    
    
    case (a, b) => {
    
    
      val simScore = this.consinSim(a._2, b._2)
      ( a._1, ( b._1, simScore ) )
    }
  }
  .filter(_._2._2 > 0.8)    // 过滤出相似度大于0.8的
  .groupByKey()
  .map{
    
    
    case (mid, items) => MovieRecs( mid, items.toList.sortWith(_._2 > _._2).map(x => Recommendation(x._1, x._2)) )
  }
  .toDF()
movieRecs.write
  .option("uri", mongoConfig.uri)
  .option("collection", MOVIE_RECS)
  .mode("overwrite")
  .format("com.mongodb.spark.sql")
  .save()

However, this method has the following disadvantages:

It is difficult to achieve real-time recommendation.
The update of the recommendation model requires repeated iterations on user behavior records, and each training is time-consuming.
The cold start problem is obvious.