Recommendation system: Establish a personalized recommendation system for movie, product or music recommendations

Building personalized recommendation systems is an important issue in the fields of machine learning and deep learning. This article will use TensorFlow to actually build a movie recommendation system based on collaborative filtering. We will introduce the basic concepts of recommender systems, data preparation, model building and training, and finally evaluation and deployment.

Part One: Overview of Recommender Systems

A recommendation system is a system used to recommend items to users based on their historical behavior and interests. There are two main types of recommendation systems: content-based recommendations and collaborative filtering recommendations. This article will focus on collaborative filtering recommendations.

Collaborative filtering recommendations

Collaborative filtering recommendation is a method that uses similarities between users to predict user interests. This method is based on the following two main ideas:

User-Item Matrix: represents the interaction information between users and items as a matrix, in which the rows represent users, the columns represent items, and the elements in the matrix represent the degree of interaction between users and items (such as ratings , number of clicks, etc.).
Similarity Metrics: By calculating the similarity between users or between items, we can infer items that users may like. One of the most commonly used similarity measures is cosine similarity.

In this article, we will use collaborative filtering method to build a movie recommendation system based on user-item matrix.

Part 2: Data preparation

Dataset introduction

To build a movie recommendation system, we will use the MovieLens dataset, a classic dataset that contains user rating data for movies. You can download datasets of different sizes from the MovieLens official website, and this article will use one of the smaller versions.

First, we need to load the data and preprocess it:

import pandas as pd

# 读取数据集
ratings_data = pd.read_csv('ratings.csv')
movies_data = pd.read_csv('movies.csv')

# 合并数据集
data = pd.merge(ratings_data, movies_data, on='movieId')

# 数据预处理
user_movie_ratings = data.pivot_table(index='userId', columns='title', values='rating')
user_movie_ratings = user_movie_ratings.fillna(0)

Part 3: Model Building

The basic idea

We will use the collaborative filtering method based on the user-item matrix to build the recommendation system. Specifically, we will use the matrix factorization method to decompose the user-item matrix into the product of two low-dimensional matrices to capture the implicit characteristics of users and items.

Model architecture

We will use TensorFlow to build the model. The following is the architecture of the model:

import tensorflow as tf

# 定义模型参数
num_users = len(user_movie_ratings)
num_movies = len(user_movie_ratings.columns)
embedding_dim = 32

# 用户嵌入层
user_input = tf.keras.layers.Input(shape=(1,), name='user_input')
user_embedding = tf.keras.layers.Embedding(input_dim=num_users, output_dim=embedding_dim)(user_input)
user_vec = tf.keras.layers.Flatten()(user_embedding)

# 物品嵌入层
movie_input = tf.keras.layers.Input(shape=(1,), name='movie_input')
movie_embedding = tf.keras.layers.Embedding(input_dim=num_movies, output_dim=embedding_dim)(movie_input)
movie_vec = tf.keras.layers.Flatten()(movie_embedding)

# 用户和物品嵌入向量点乘
dot_product = tf.keras.layers.Dot(axes=1)([user_vec, movie_vec])

# 构建模型
model = tf.keras.Model(inputs=[user_input, movie_input], outputs=dot_product)
model.compile(loss='mean_squared_error', optimizer='adam')

Part 4: Model training

Now we can use the prepared dataset and model to train:

# 定义训练数据
X = [ratings_data['userId'], ratings_data['movieId']]
y = ratings_data['rating']

# 拟合模型
model.fit(X, y, batch_size=64, epochs=5, verbose=1, validation_split=0.2)

Part 5: Model Evaluation

After training is complete, we need to evaluate the performance of the model. We can use the root mean square error (RMSE) to evaluate the prediction accuracy of the model:

from sklearn.metrics import mean_squared_error
import numpy as np

# 预测评分
predictions = model.predict(X)

# 计算均方根误差
mse = mean_squared_error(y, predictions)
rmse = np.sqrt(mse)
print("RMSE:", rmse)

Part Six: Recommendation Generation

Now that we have built a trained model, we can use it to generate personalized movie recommendations. Given a user, we can calculate the user's ratings for all movies and recommend the highest rated movies.

# 选择一个用户
user_id = 1

# 获取该用户未评分的电影
user_ratings = user_movie_ratings.loc[user_id]
user_unrated_movies = user_ratings[user_ratings == 0].index

# 为用户未评分的电影生成预测评分
user_input = np.array([user_id] * len(user_unrated_movies))
movie_input = np.array(user_unrated_movies)
predicted_ratings = model.predict([user_input, movie_input])

# 推荐前N部电影
top_n = 10
top_movie_indices = predicted_ratings.flatten().argsort()[-top_n:]
top_movies = user_unrated_movies[top_movie_indices]

# 打印推荐电影
print("推荐电影：")
for movie in top_movies:
    print(movies_data[movies_data['title'] == movie]['title'].values[0])