The statistical recommendation part mainly includes: high score recommendation, popular recommendation, recent popular recommendation, etc. The following is the programming of the program
1. Movie high score statistics
- Use the RatingDF data to get the average rating of the movie and store it in
AverageMovies
the table
// 从mongodb加载数据
val ratingDF = spark.read
.option("uri", mongoConfig.uri)
.option("collection", MONGODB_RATING_COLLECTION)
.format("com.mongodb.spark.sql")
.load()
.as[Rating]
.toDF()
val movieDF = spark.read
.option("uri", mongoConfig.uri)
.option("collection", MONGODB_MOVIE_COLLECTION)
.format("com.mongodb.spark.sql")
.load()
.as[Movie]
.toDF()
// 创建名为ratings的临时表
ratingDF.createOrReplaceTempView("ratings")
// 优质电影统计,统计电影的平均评分,mid,avg
val averageMoviesDF = spark.sql("select mid, avg(score) as avg from ratings group by mid")
storeDFInMongoDB(averageMoviesDF, AVERAGE_MOVIES)
2. Historically popular recommendations
// 历史热门统计,历史评分数据最多,mid,count
val rateMoreMoviesDF = spark.sql("select mid, count(mid) as count from ratings group by mid")
// 把结果写入对应的mongodb表中
storeDFInMongoDB( rateMoreMoviesDF, RATE_MORE_MOVIES )
3. Recent popular recommendations
// 近期热门统计,按照“yyyyMM”格式选取最近的评分数据,统计评分个数
// 创建一个日期格式化工具
val simpleDateFormat = new SimpleDateFormat("yyyyMM")
// 注册udf,把时间戳转换成年月格式
spark.udf.register("changeDate", (x: Int)=>simpleDateFormat.format(new Date(x * 1000L)).toInt )
// 对原始数据做预处理,去掉uid
val ratingOfYearMonth = spark.sql("select mid, score, changeDate(timestamp) as yearmonth from ratings")
ratingOfYearMonth.createOrReplaceTempView("ratingOfMonth")
// 从ratingOfMonth中查找电影在各个月份的评分,mid,count,yearmonth
val rateMoreRecentlyMoviesDF = spark.sql("select mid, count(mid) as count, yearmonth from ratingOfMonth group by yearmonth, mid order by yearmonth desc, count desc")
// 存入mongodb
storeDFInMongoDB(rateMoreRecentlyMoviesDF, RATE_MORE_RECENTLY_MOVIES)
Summarize
- Statistical recommendations do not require real-time, so when we schedule, we can choose to schedule once a day.
- If you want to expand, you can add popular recommendations and unpopular recommendations (based on popularity or other algorithms) of each category.