[Movie Recommendation System] Statistical Recommendation

The statistical recommendation part mainly includes: high score recommendation, popular recommendation, recent popular recommendation, etc. The following is the programming of the program

1. Movie high score statistics

  • Use the RatingDF data to get the average rating of the movie and store it in AverageMoviesthe table
// 从mongodb加载数据
    val ratingDF = spark.read
      .option("uri", mongoConfig.uri)
      .option("collection", MONGODB_RATING_COLLECTION)
      .format("com.mongodb.spark.sql")
      .load()
      .as[Rating]
      .toDF()

    val movieDF = spark.read
      .option("uri", mongoConfig.uri)
      .option("collection", MONGODB_MOVIE_COLLECTION)
      .format("com.mongodb.spark.sql")
      .load()
      .as[Movie]
      .toDF()

    // 创建名为ratings的临时表
    ratingDF.createOrReplaceTempView("ratings")

    // 优质电影统计,统计电影的平均评分,mid,avg
    val averageMoviesDF = spark.sql("select mid, avg(score) as avg from ratings group by mid")
    storeDFInMongoDB(averageMoviesDF, AVERAGE_MOVIES)

2. Historically popular recommendations

// 历史热门统计,历史评分数据最多,mid,count
val rateMoreMoviesDF = spark.sql("select mid, count(mid) as count from ratings group by mid")
// 把结果写入对应的mongodb表中
storeDFInMongoDB( rateMoreMoviesDF, RATE_MORE_MOVIES )

3. Recent popular recommendations

// 近期热门统计,按照“yyyyMM”格式选取最近的评分数据,统计评分个数
// 创建一个日期格式化工具
val simpleDateFormat = new SimpleDateFormat("yyyyMM")
// 注册udf,把时间戳转换成年月格式
spark.udf.register("changeDate", (x: Int)=>simpleDateFormat.format(new Date(x * 1000L)).toInt )

// 对原始数据做预处理,去掉uid
val ratingOfYearMonth = spark.sql("select mid, score, changeDate(timestamp) as yearmonth from ratings")
ratingOfYearMonth.createOrReplaceTempView("ratingOfMonth")

// 从ratingOfMonth中查找电影在各个月份的评分,mid,count,yearmonth
val rateMoreRecentlyMoviesDF = spark.sql("select mid, count(mid) as count, yearmonth from ratingOfMonth group by yearmonth, mid order by yearmonth desc, count desc")

// 存入mongodb
storeDFInMongoDB(rateMoreRecentlyMoviesDF, RATE_MORE_RECENTLY_MOVIES)

Summarize

  • Statistical recommendations do not require real-time, so when we schedule, we can choose to schedule once a day.
  • If you want to expand, you can add popular recommendations and unpopular recommendations (based on popularity or other algorithms) of each category.

Guess you like

Origin blog.csdn.net/weixin_40433003/article/details/132049401