Recommendation System Algorithm 02: Easy-to-understand Collaborative Filtering Algorithm Implementation

Collaborative filtering algorithm is a classic recommendation algorithm. Its basic idea is to find the similarity between users and items according to the user's historical behavior and interests, so as to recommend items that may be of interest to users.

Specifically, collaborative filtering algorithms are divided into two types: user-based collaborative filtering and item-based collaborative filtering.

The user-based collaborative filtering algorithm first uses user historical behavior data to construct a user-item rating matrix. Then, according to the similarity between users in this matrix, the similarity score between the current user and other users is calculated. Finally, according to the similarity score, select K users who are most similar to the current user, and recommend items with high scores from these users to the current user.

The item-based collaborative filtering algorithm is similar to the user-based collaborative filtering algorithm, but it is recommended based on the similarity between items. Specifically, the algorithm first uses user historical behavior data to construct a user-item rating matrix. Then, according to the similarity between items in the matrix, calculate the similarity score between the item rated by the current user and other items. Finally, according to the similarity score, select K items that are most similar to the current user's rated items, and recommend these items to the current user.

The advantage of the collaborative filtering algorithm is that it can make recommendations based on user behavior and interests, and has a very good personalized effect. At the same time, the algorithm is relatively simple, easy to implement and understand. However, the collaborative filtering algorithm also has some disadvantages. For example, when the data set is sparse, the accuracy and coverage of the recommendation may be low; at the same time, when the number of recommended items is large, the efficiency of the algorithm may be low.

The following is a simple Python-based example that demonstrates how to use a user-based collaborative filtering algorithm for movie recommendation.

import numpy as np

# 构建用户-电影评分矩阵
ratings = np.array([
    [4, 5, 0, 4, 0],
    [0, 4, 3, 0, 4],
    [3, 0, 0, 4, 3],
    [0, 3, 4, 0, 5],
    [5, 0, 3, 5, 0]
])

# 计算用户相似度矩阵
user_similarities = np.zeros((ratings.shape[0], ratings.shape[0]))
for i in range(ratings.shape[0]):
    for j in range(i+1, ratings.shape[0]):
        mask = (ratings[i] > 0) & (ratings[j] > 0)
        if np.sum(mask) > 0:
            similarity = np.sum(ratings[i][mask] * ratings[j][mask]) / np.sqrt(np.sum(ratings[i][mask]**2) * np.sum(ratings[j][mask]**2))
            user_similarities[i][j] = similarity
            user_similarities[j][i] = similarity

# 对每个用户进行电影推荐
for user_id in range(ratings.shape[0]):
    # 找到与该用户最相似的K个用户
    k = 2
    similarities = user_similarities[user_id]
    nearest_neighbors = similarities.argsort()[::-1][1:k+1]
    
    # 获取这些相似用户的电影评分
    neighbor_ratings = ratings[nearest_neighbors]
    
    # 计算推荐分数
    scores = np.sum(neighbor_ratings, axis=0)
    
    # 对未评分的电影进行推荐
    unrated_movies = np.where(ratings[user_id] == 0)[0]
    recommended_movies = scores.argsort()[::-1][:len(unrated_movies)]
    
    # 打印推荐结果
    print("User", user_id, "recommended movies:", recommended_movies)

Matrix description:

ratings = np.array([
    [4, 5, 0, 4, 0],
    [0, 4, 3, 0, 4],
    [3, 0, 0, 4, 3],
    [0, 3, 4, 0, 5],
    [5, 0, 3, 5, 0]
])

The user-movie rating matrix is ​​the basis of the collaborative filtering algorithm, which is used to record the rating of each user for each movie. In this matrix, each row represents a user, each column represents a movie, and each element in the matrix represents the corresponding user's rating for the corresponding movie.

For example, the rating matrix ratings constructed in the above code is a 5×5 matrix, which represents the ratings of 5 movies by 5 users. Specifically, the first row represents the first user's rating of 5 movies, where 4 means that the user rated the first movie as 4 points, and 5 means that the user rated the second movie as 5 points , 0 means that the user did not rate the third movie, 4 means that the user rated the fourth movie with 4 points, and 0 means that the user did not rate the fifth movie. Similarly, the meaning of other rows and columns can be understood.

The user-movie rating matrix is ​​an important data source for the collaborative filtering algorithm, and its construction needs to extract the corresponding user-movie rating information from real data. In practical applications, this process may face some challenges, such as data sparsity, data preprocessing and cleaning, and so on. Therefore, when using the collaborative filtering algorithm in practice, it is necessary to properly process and optimize the user-movie rating matrix to improve the recommendation effect and the performance of the algorithm.

In the above example, a rating matrix of 5 users and 5 movies is first constructed. Then, the similarity score between each user is calculated, and movie recommendations are made for each user based on the similarity score. Specifically, for each user, we first find the K users most similar to it, and then calculate the recommendation score based on the movie ratings of these similar users. We find the movies that the user has not rated, and recommend the movies with the highest recommended scores to the user.

The result of running the above code:

User 0 recommended movies: [4 2]
User 1 recommended movies: [3 0]
User 2 recommended movies: [2 3]
User 3 recommended movies: [3 0]
User 4 recommended movies: [1 4]

Guess you like

Origin blog.csdn.net/weixin_41194129/article/details/130283202