Recommender Systems: Collaborative Filtering

In the previous article, we introduced the recommendation system: content-based filtering , which can be calculated based on product metadata, provides options for making recommendations, and recommends products that are most similar to products that users have purchased in the past. Today we will talk about how A method for providing recommendations by exploiting the similarities between users and products.

Collaborative filtering is a method that exploits the similarity between users and products to provide recommendations. Collaborative filtering analyzes similar users or similarly rated products and recommends users based on this analysis.

Collaborative filtering is divided into 3 subheadings:

  1. Item-based collaborative filtering
  2. User-Based Collaborative Filtering
  3. Model-Based Collaborative Filtering

Item-based collaborative filtering

This method is a method of analyzing product similarity or users' evaluation of products, and making suggestions based on the analysis results.

The following table shows the ratings of n movies by m users. We would like to recommend a movie with similar ratings to "Split" to users who watched and liked "Split".

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-F4m38xjs-1682045339104)(https://secure2.wostatic.cn/static/nh9h3Jr1Vg6hJXUDBpw8nn/image.png?auth_key =1682045334-hR3aHVH6qXEDx5p5hi6sSV-0-28c9cff8b9c7b399781d9eaf4ec17bcd)]

In this case, Film 4 can be recommended for this user, a film with a similar orientation (both likes and dislikes) to Split.

Example:

An online movie platform wants to develop a collaborative filtering-based recommendation system to meet the needs and opinions of its user community. When a user likes a movie, the system will recommend other similar movies to the user based on similar liking patterns. This is done to provide recommendation services that are more in line with user tastes.

About the dataset:

Movie:

  • movieId: the id of the movie
  • title: the title of the movie

score:

  • userId: the id of the user
  • movieId: the id of the movie
  • score: The user's rating for the movie
  • time: the timestamp of the score

You can find the dataset here . For this project, we will merge the "Movies" and "Ratings" datasets on "Movies".

At the start of the project, datasets are read and merged. You can view the full code of the project.

Create a user movies data frame:

In this step, we'll create the user movies dataframe, but we don't want sparsity in it . For example, suppose a user has only rated one movie. Even if 1 movie is rated, that user gets a cell for every other movie in the User-Movie data frame. This delays calculations to be done and causes performance issues. To avoid these computational problems and prevent movies watched by very few users from being included in the recommendations, some reduction process should be done.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-NJjLalb1-1682045339106)(https://secure2.wostatic.cn/static/tzQaN2WcageQarx1ML3bi7/image.png?auth_key =1682045334-953F1nCidf8aWDcB97gct4-0-daae8c0d17ffa7c0c487a78b8a7ef502)]

After the reduction process is complete, a pivot table is created with "userId" in the row, "title" in the column, and "rating" in the intersection.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-p2ul2z2v-1682045339107) (https://secure2.wostatic.cn/static/vdy7Zgmi6TkiMCLBc5qvqt/image.png?auth_key =1682045334-hLTVuC2SE5iSoSes6rqkwo-0-3fbc2b9f5029a46b4ee56a4a2d262aef)]

In the User-Movie data frame, if a user has not rated a movie, the cells at their intersection are represented by NaN.

Movie prediction based on item:

Since the user movie matrix has been created, it is possible to find similarities between movies and other movies by looking at the correlations between movies.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Anj2092j-1682045339108) (https://secure2.wostatic.cn/static/2iy6vG8s5iuj1ZmsHqSdVF/image.png?auth_key =1682045334-dRcHm7hYDyJ8DXr6Y9BiUU-0-ced5a18406c465187e8bc9101765922c)]

After randomly selecting a movie, calculate the correlation of this movie with other movies. Movies with high correlation can be recommended, meaning movies that exhibit similar behavior to this movie have already been recommended.

Movies can also be selected and checked manually. You can use the code snippet below to search by keyword to get the full names of movies in the dataset.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-9hD8sKvq-1682045339109) (https://secure2.wostatic.cn/static/kY4A6szyKaaEMuQwAkXemy/image.png?auth_key =1682045334-wcxYQYrJkBNT71YaFZKnBH-0-fbdee8447299258434b1ee7fd4487e7c)]

User-Based Collaborative Filtering

Collaborative filtering is a method of analyzing users' behavior (likes - Toutiao's early recommender system) and providing recommendations based on the likes of users who exhibited similar behavior.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-8poc35et-1682045339110) (https://secure2.wostatic.cn/static/jDzmynpeqTU1zvE6BzSRCX/image.png?auth_key =1682045334-8c4xkU9KRsXyxAWNHcP76L-0-6bb4475fd07393e8ebcfcc19523b6018)]

This table shows the ratings of n movies by m users. Ratings for movies not watched by the user are left blank. Movies that are similar to user 3 and that he may like need to be recommended. To do this, users who exhibit similar behavior should first be found. It can be seen that the user who exhibits similar behavior in the movie evaluated by user 3 is user 2, and then select the movie 3 that user 2 likes but user 3 has not watched for recommendation. As a result of this suggestion, User 3 is expected to like the movie.

Example:

In this project, the same dataset used in item-based collaborative filtering will be used. The operation will be performed on a user movie data frame created in a project based project. Here is the full code for the project, the process of inspecting the dataset and creating the dataframe.

Determine which movies to recommend a user to watch:

Users are selected at random. After this selection, the movies dataframe should be reduced for this user. In this way, the movies that the user has watched can be determined.

Through len(movies_watched), it is found that random_user has watched 33 movies.

Visit other users watching the same movie:

In this step, we first need to find out how many movies other users have co-watched with random users.

At this point, we need a constraint because a user who has only seen 1 or 2 average movies with random_user will not be a criterion for the recommendation process.

Identify the users with the most similar behavior to the user to be suggested

This will be done in 3 steps:

Step 1: Aggregate random_user and other users' data

Step 2: Create the associated data frame

Step 3: Find the most similar users (top users)

The value found in the previous stage is the relevant value for all users. The purpose is to cover users with high correlation with random_user. Therefore, select users with a correlation higher than 0.65.

Calculate the weighted average recommender score:

If sorted by rank, the influence of relevance is ignored, and if sorted by relevance, the influence of rank is ignored. To prevent this and see the impact of both, we will multiply these two values ​​together to get a new variable called "weighted_rating" and sort based on that.

In this way, recommendations like the above can be made to users, taking into account relevance and ratings.

Model-Based Collaborative Filtering (Matrix Factorization)

In model-based collaborative filtering, this problem is addressed more comprehensively. Suppose there is a problem that needs to be optimized.

The goal is to predict and fill the cells marked BLANK in the matrix of m users and n movies.

To fill in the gaps, weights for "latent features" that assume the presence of users and movies are found on existing data, and use these weights to make predictions for observations that do not exist.

The User-Movie matrix is ​​decomposed into 2 matrices with fewer dimensions. Suppose the transformation from 2-matrix to User-Movie matrix occurs under latent factors. The weights for the latent factors are found in the filled observations, and the empty observations are filled with the found weights.

To understand this process more clearly, it is useful to go through an example.

We have a matrix of user movies like above and we want to estimate null values ​​here. First, we split it into 2 lower-dimensional matrices, user factors and movie factors.

At this point, the weights of latent factors on the population observations should look like this:

All p and q values ​​are found iteratively over existing values ​​and used. Initially, it attempts to predict random p and q values ​​and values ​​in the scoring matrix. In each iteration, the wrong estimates are scheduled and attempts are made to approximate the values ​​in the rating matrix. Thus, as a result of a particular iteration, the p and q matrices are filled.

To understand whether a forecast is good or bad, a common measure is needed. To do this, add the squares of the differences between all estimated and actual values, take the square root, and then average (RMSE and MSE values). This gives us information about all the data. In this way, we can obtain information about the average error of the predictions we made at the beginning of the study based on the values ​​assigned to the user and movie matrices. Update the p and q values ​​to minimize this error.

Gradient descent for matrix factorization:

One way to find p and q weights is to use gradient descent. Gradient descent is an optimization method for function minimization.

Parameter values ​​are iteratively updated in the direction of "steepest descent" (defined as the negative value of the gradient), and the parameters that will give the minimum value of the associated function are found.

Here, the weights in the p and q matrices are swapped according to the derivative. The derivative of a function at a point gives the maximum direction in which the function increases. The parameter values ​​are updated to minimize the correlation function when iteratively along the negative direction defined as the gradient, i.e. opposite to the increasing direction.

Example:

In this final collaborative filtering project, the same datasets used in both item-based and user-based collaborative filtering will be used. Here is the complete code for the project that checks the dataset process .

Data preparation:

For traceability, I would reduce the dataset based on these 4 movies and their ids and create a user movies dataframe from the reduced dataset.

Since this project uses the surprise library and has its special data structure, movie_user_df needs to be converted to this special structure.

Modeling:

For the modeling step, the data should first be split into training and testing sets with a ratio of 75% and 25%. The model object for SVD should then be created and fitted to the training set. Next, the model should be tested on the test set.

The RMSE metric may be more suitable for evaluating the mean error of forecasts.

The actual score value of the watch unit whose userId is 1.0 and movieId is 541 is as follows.

When we made predictions with the model we built for the same observation unit, we got a score of 4.33:

Model adjustments:

In this step, the optimization of the model will be carried out, that is, we will try to increase the predictive performance of the model. This will be achieved through hyperparameter optimization.

Considering the formulas discussed in the theory section, the hyperparameters are epoch, latent factor, learning rate, and λ. For hyperparameter optimization, we will define the set of parameters for epoch and learning rate as follows. Then, by trying these possible parameter sets with GridSearchCV, the average error is calculated for the combination of epoch and learning rate, which will give the lowest possible average error.

Final model and predictions:

The default values ​​of the model and the best parameters we found with GridSearchCV are different from each other. Therefore, the SVD model object must be recreated with these optimal parameters.

So far, we have split the data into training and test sets, checked for errors and optimized hyperparameters. If the model is built on larger data, it can learn better. Therefore, the model will be built on top of all the data.

The prediction process of userId-movieId pair in the modeling step:

In the modeling step, the predicted value for a userId of 1.0 and a movieId of 541 is 4.33. After hyperparameter optimization, you can see that the new prediction is 4.23. After hyperparameter optimization, the prediction process has performed closer to the actual value (4.00), i.e. the error has decreased.

The resulting optimized model provides the likelihood of making predictions on the desired user-movie pair. Once the relevant predictions have been made, the movies can be screened. Movies with predicted high ratings for certain users are recommended to these users.

Guess you like

Origin blog.csdn.net/stone1290/article/details/130283232