[Recommender System]: Overview of Collaborative Filtering and Content-Based Filtering

1 Introduction

"We are leaving the information age and entering the recommendation age."

Like many machine learning techniques, recommender systems make predictions based on users' historical behavior. A recommender system is an information filtering system, specifically, based on a user's historical behavior, social relationships, and points of interest. to predict user preferences for a set of items.

Over the past few decades, with the rise of Youtube, Amazon, Netflix, and many other such web services, recommender systems have taken an increasing place in our lives. From e-commerce (recommending buyers with items they might be interested in) to online advertising (recommending users with the right content, matching their preferences), recommendation systems are unavoidable in our daily lives today. In general, recommender systems are algorithms designed to recommend relevant items to users (items are movies to watch, text to read, products to buy, or any other item that depends on the industry).

Recommender systems are very important in some industries because they can generate huge revenue when they are efficient and are also a way to differentiate themselves from competitors. As a testament to the importance of recommender systems, we can mention that a few years ago, Netflix organized a challenge (the "Netflix Prize") with the goal of producing a recommender system that performed better than its own algorithm and won a reward of 1 million Dollar bonus.

To build a recommender system, the two most typical approaches are

  • content -based
  • Collaborative Filtering

Among them, collaborative filtering is divided into two methods: memory based and model based . Next, we will focus on discussing their principles, advantages and disadvantages.

2. Content-based Filtering

Instead of using user interaction and feedback, content-based approaches require a lot of information about the characteristics of the item itself. For example, it can be movie attributes such as genre, year, director, actor, etc., or it can be the text content of articles extracted by applying natural language processing.

  • Tag items based on their attributes
  • Calculate similarity by label

Then, the idea of ​​a content-based approach is to try to build a model to explain the observed user-item interactions based on the "features" available. Still considering users and movies, for example, we will try to model the fact that young women tend to rate some movies higher, young men tend to rate other movies higher, etc. If we manage to get such a model, it would be easy to make new predictions for a user: we just need to look at that user's profile (age, gender...) and based on that information determine relevant movie suggestions. Therefore, the content-based recommendation method is a static method.

image-20220807225313531

Content -based methods are much less affected by the cold-start problem : new users or items can be analyzed by their characteristics (attributes) and thus can provide relevant recommendations for these new entities. Logically there will be problems with projects with only new users or with features that were not there before, but this will rarely happen once the system is old enough. , recommendation problems are often transformed into classification problems (predicting whether a user "likes" an item) or regression problem (predicting the user's rating for an item). Based on content layman's explanation , in both cases we will set up a model that will be based on user or item characteristics that we can use ( what we call content based ).

If our classification (or regression) is based on user characteristics , we say that the approach is item-centric : modeling, optimization, and computation can be done "per-item". In this case, we build and learn a model based on user characteristics, trying to answer the question "What is the probability that each user likes this item?". (Or "How much did each user rate this item?"). There are usually many users interacting with an item, so the resulting model is based on robustness. However, considering that the interactions of the learned model come from each user, even if these users have similar characteristics (features), their preferences may be different. This means that although this method is more robust , it is less personal than the later user-centric methods .

If we are studying project characteristics , the approach is user-centric : modeling, optimization, and computation can be done "by the user". We then train a model per user based on item features, trying to answer "What is the probability that this user likes each item?". At this point, we can train a model for each user, and the obtained model is more personalized than item-centric models because it only considers interactions from the studied users. However, most of the time users interact with relatively few items, so the model we obtain is far less robust than item-centric models.

insert image description here

From a practical point of view, in most cases, getting some information about a new user (the user doesn't want to answer too many questions) is much more difficult than getting a lot of information about a new item. (Because people who have used these items are interested in filling out this information in order to recommend their items to the right users). We can also note that depending on the complexity of the relationships we want to express, the models we build can range from simple to complex: linear regression to deep neural networks. Finally, let's mention that content-based approaches can be both user-centric and item-centric: information about both users and items can be used in our model, for example by stacking two feature vectors and building a neural network architecture .

2.1 Item-Centric Bayesian Classifier

Let's first consider the case of item-centric classification: for each item, we want to train a Bayesian classifier that takes user features as input and outputs "like" or "dislike". So, to complete the classification task, we need to compute
pitem ( like ∣ userfeatures ) pitem ( dislike ∣ userfeatures ) \frac{p_{item}(like|user_{features})}{p_{item}(dislike|user_{features} )}pitem( d i s l ik e u se rfeatures)pitem( l ik e u se rfeatures)
The ratio between the probability that a user with a given characteristic likes the item under consideration and the probability that they dislike it. The conditional probability ratio that defines our classification rule (with a simple threshold) can be expressed in terms of the Bayesian formula
pitem ( like ∣ userfeatures ) pitem ( dislike ∣ userfeatures ) = pitem ( userfeatures ∣ like ) × pitem ( like ) pitem ( userfeatures ∣ dislike ) × pitem ( dislike ) \frac{p_{item}(like|user_{features})}{p_{item}(dislike|user_{features})}= \frac{p_{item}(user_{features}| like)\times p_{item}(like)}{p_{item}(user_{features}|dislike)\times p_{item}(dislike)}pitem( d i s l ik e u se rfeatures)pitem( l ik e u se rfeatures)=pitem(userfeaturesd i s l ik e )×pitem(dislike)pitem(userfeaturesl ik e )×pitem(like)

where: pitem ( like ) p_{item}(like)pitem( lik e ) is the prior probability, which can be calculated from the previous data.

p i t e m ( ⋅ ∣ l i k e ) p_{item}(·|like) pitem( l ik e ) is the conditional probability, which is the most important part of the Bayesian model. First, Naive Bayes assumes that the features areconditionally independent. In this way, we can disassemble the conditional probability.

When the feature is discrete data , as long as it is convenient to count the frequency of each division in each category in the training sample, it can be used to estimate P ( userfeaturei ∣ like ) P(user_{feature_i}|like)P(userfeatureilik e ) . _

The following discussion focuses on the case where the feature attribute is a continuous value.

When the feature is continuous data : its values ​​are usually assumed to follow a Gaussian distribution (also known as a normal distribution). Therefore, as long as the mean and standard deviation of the feature item division in each category in the training sample are calculated, and substituted into the formula of normal distribution, the required estimated value can be obtained. The calculation of the mean and standard deviation will not be repeated here.

Another issue that needs to be discussed is when P ( userfeaturei ∣ like ) = 0 P(user_{feature_i}|like)=0P(userfeatureil ik e )=0 What to do, this phenomenon will occur when a certain feature item division under a certain category does not appear, which will greatly reduce the quality of the classifier. In order to solve this problem, we introduce theLaplace smoothing method. Its idea is very simple, that is, add 1 to the count of all divisions under no category, so that if the number of training sample sets is sufficiently large, it will not affect the results. And solve the above-mentioned embarrassing situation that the frequency is 0.

image-20220808162754857

2.1 User-Centric Linear Regression

Now let's consider the case of user-centric regression: for each user, we want to train a simple linear regression that takes item features as input and outputs a rating for that item. We denote M as the user-item interaction matrix, matrix X the user coefficients to be learned, and matrix Y the given item features. Then, for a given user i, we learn X i X_i by solving the following optimization problemXi中的系数
X i = m i n 1 2 ∑ ( i , j ) [ ( X i ) ( Y j ) T − M i j ] 2 + λ 2 ∑ k ( X i k ) 2 X_i = min\frac{1}{2}\sum_{(i,j)}[(X_i)(Y_j)^T-M_{ij}]^2+\frac{\lambda}{2}\sum_{k}(X_{ik})^2 Xi=min21(i,j)[(Xi) ( ANDj)TMij]2+2lk(XI)2
Note: i is fixed, so the first sum is only for (user, item) pairs related to user i. We can observe that if we solve the model for all users at the same time, the optimization problem is exactly the same as the problem we solve later in "alternating matrix factorization" while keeping the items fixed. This shows that both model-basedcollaborative filtering methods (such as matrix factorization) and content-based methodsassume the existence oflatent models, but model-based collaborative filtering must learn both user and item matrices, while content-based methods only need to Learn a matrix of users or items.

image-20220808163143037

3. Collaborative Filtering

  • Collaborative filtering is all about finding users similar to you through data, through their behavior and what they like. Recommend items or content for you that may be of interest to you
  • In daily life, we will also find friends with the same interests to help us recommend movies or music

Collaborative filtering is a dynamic approach. Collaborative filtering does not require anything other than the user's historical preferences for a set of items. Because it's based on historical data, the core assumption here is that users who liked it in the past will like it in the future. In terms of user preferences, it is usually expressed in two categories. Explicit ratings , is the rate at which users give items on a sliding scale, such as the Titanic's 5-star rating. This is the most direct feedback from users about how much they like an item. Implicit Rating , which indirectly reflects user preferences, such as page views, clicks, purchase records, whether to listen to music tracks, etc. In this article, we will take a closer look at collaborative filtering, a traditional and powerful tool for recommender systems.

Collaborative filtering for recommender systems is a method to generate new recommendations based only on the recorded past interactions between users and items. These interactions are stored in the User-Item Interaction Matrix .

image-20220807225820675

Then, the main idea of ​​collaborative filtering methods is that these past user-items interact to detect similar users and/or similar items, and make predictions based on these estimated degrees of similarity.

The category of collaborative filtering algorithms is divided into two subcategories, commonly referred to as memory-based methods and model-based methods. Memory-based methods use recorded interaction values ​​directly, assuming no model, and are basically based on nearest-neighbor searches (e.g., find the closest users from the users of interest and recommend them the most popular items among those users) . Model-based approaches assume an underlying "generative" model to explain user-item interactions and try to discover it to make new predictions.

[External link image transfer failed, the origin site may have anti-leech mechanism, it is recommended to save the image and upload it directly (img-bNoEE9QB-1659963301314) (https://raw.githubusercontent.com/19973466719/jojo-pic/main/img /20220808205354.png)]

The main advantage of collaborative filtering methods is that they do not require information about users or items, so they can be used in a broader context. Furthermore, the more users interact with items, the more accurate new recommendations will be: for a fixed set of users and items, new interactions recorded over time bring new information and make the system more and more effective.

However, since it only considers past interactions to make recommendations, collaborative filtering suffers from a "cold-start problem": it is impossible to recommend anything to a new user or a new item to any user, and too few interactions for many users or items to be effective deal with. This shortcoming can be addressed in different ways: recommending random items to new users or recommending new items to random users (random strategy), recommending popular items to new users or recommending new items to most active users (maximum expectation strategy), recommending a set of Items of different new users or new items of a group of different users (exploratory strategy), or finally, early use of non-collaborative filtering methods on users or items.
Below, we will mainly introduce three classical collaborative filtering methods: two memory-based methods (user-user and item-item) and one model-based method (matrix factorization).

3.1 Memory-based collaborative filtering

User-user and item-item approaches are mainly characterized in that they only use information from the user-item interaction matrix and they assume no model to generate new recommendations.

3.1.1 User-user

To make new recommendations to users, user-user methods roughly try to identify users with the most similar "interaction profiles" (nearest neighbors) in order to recommend items that are most popular among those neighbors (and these items are important to our new to users).

In layman's terms, it is to first classify users according to their historical data information, and then recommend items that are popular among users in the same category. For example, users A, B, and C are very similar. Now users A and B often buy product D, but user C has never bought it. We can recommend product D to user C.

This approach is called "user-centric" because it represents users and assesses distance between users based on their interactions with items.

Suppose we want to make recommendations for a given user. First, each user can be represented by its interaction vector (each row of the interaction matrix) with different items. We can then calculate some "similarity" between the user we're interested in and all other users. This similarity measure is such that two users with similar interactions on the same item should be considered close. Once the similarity to each user is calculated, we can take the k-nearest neighbors of users, and then suggest the most popular items among them (only looking at items that our reference user has not yet interacted with).

Note that the number of "common interactions" (how many items two users have in common) should be carefully considered when calculating similarity between users! In fact, in most cases, we want to avoid people who have only one interaction with our reference user but who may have a 100% match for this one interaction, but are considered better than users who have 100 co-interactions but only a 98% match more similar. Therefore, if two users interact with many common items in the same way (similar ratings, similar hover times...), we consider them similar.

img

3.1.2 Item-item

To make new recommendations to the user, the idea of ​​the item-item approach is to find items that are similar to items that the user has already "actively" interacted with. Two items are considered similar if the majority of users interacting with them act in a similar way. This approach is called "item-centric" because it represents items based on the user's interactions with them, and evaluates the distance between those items.

Suppose we want to make recommendations for a given user. First, we consider this user's favorite item and represent it (like all other items) by its interaction vector with each user ("its column" in the interaction matrix). Then, we can calculate the similarity between the "best item" and all other items. Once the similarity is calculated, we can keep the k nearest neighbor items to recommend to our interested users.

Note that to get more relevant recommendations, we can do this not just for the user's favorite item, but consider the **n top items. **In this case, we can recommend items that are close to several of these preferred items.

img

3.1.3 Comparing user-user and item-item

User-user methods are based on searching for similar users in terms of interactions with items. Generally, each user interacts with only a few items, which makes the method very sensitive (high variance) to any recorded interactions. On the other hand, we obtain more personalized results (low bias) since the final recommendation is only based on the recorded interactions of users similar to the user we are interested in.
In contrast, the item-item approach is based on searching for similar items in terms of user-item interactions. Because in general, many users interact with an item, neighborhood search is much less sensitive (lower variance) to individual interactions. As a counterpart, interactions from various users (even users very different from our reference user) are then considered in the recommendation, making the method less personal (more biased). Therefore, this method is not as personalized as the user-user method, but more robust.

image-20220808001021534

complexity and side effects

One of the biggest drawbacks of memory-based collaborative filtering is that they do not scale easily: generating new recommendations can be time-consuming for large systems. In fact, for systems with millions of users and millions of items, the nearest neighbor search step can become intractable if not carefully designed (the complexity of the KNN algorithm is O(ndk), where n is the number of users, d is the number of items, k the number of neighbors considered). To make the computation of large systems more tractable, we can either exploit the sparsity of the interaction matrix when designing the algorithm or use the approximate nearest neighbor method (ANN).
In most recommendation algorithms, great care must be taken to avoid having an "increasing and growing" effect on popular items, and to avoid trapping users in so-called "no information" zones. ** In other words, we don't want our system to tend to recommend more and more popular items, nor do we want our users to only receive recommendations for items that are very close to what they already like, without having a chance to learn about them Might also like new items (because these aren't "close enough" to recommend). As we mentioned, these problems can arise in most recommendation algorithms, but especially for memory-based collaborative filtering algorithms. In fact, this phenomenon can be more prominent and more frequently observed due to the lack of a "normalized" model.

3.2 Model-based collaborative filtering methods

Model-based collaborative filtering methods rely only on user-item interaction information and assume that a latent model should explain these interactions. For example, Matrix Factorization involves decomposing a huge and sparse user-item interaction matrix into the product of two smaller and denser matrices: a user-factor matrix (containing user representations) multiplied by a factor-item matrix (includes item representation)

For example, consider we have a matrix of user movie ratings. To model the interaction between the user and the movie, we can assume

  • There are some cinematic features.
  • These features can also be used to describe user preferences (the user likes the feature with high value, otherwise the feature value is low)

image-20220808131508796

However, we do not want to assign these features to our model explicitly (as it can be used for content-based methods that we will describe later). Instead, we prefer to let the system discover these useful features on its own and automatically represent users and items. Since they are learned rather than given, the separately extracted features have mathematical meaning but no intuitive interpretation. However, this algorithm ends up producing a structure that is very close to an intuitive decomposition that a human can think of. In fact, the result of this decomposition is that users who are close in terms of preferences and items that are close in terms of features end up with close representations in the latent space.

Mathematical Explanation of Matrix Decomposition

Next we will briefly introduce the mathematical overview of matrix factorization. More specifically, we describe a classical iterative method based on gradient descent that can obtain the decomposition of very large matrices without simultaneously loading all the data into computer memory.
Let us consider an interaction matrix M(nxm) containing ratings, where each user rated only certain items

image-20220808173423436

Most of them are None, which means that the user has not rated the movie yet. In the recommendation system, the task is heavy, and this rating matrix is ​​often a very sparse matrix. What matrix decomposition does is to predict the missing ratings in the matrix, so that the predicted ratings can reflect the user's likes. We want to decompose this matrix such that:
M = XYTM=XY^TM=XYT
where X is theUser Matrix(n×l), where each row represents a user, and Y is theItem Matrix(l×k), where each column represents an item.

Here l is the dimension representing the latent space of users and items. Therefore, we search the matrices X and Y whose dot product is closest to the existing interaction. Denote E as the set of pairs (i,j) such that M ij M_{ij}Mijnot null, we want to find X and Y that minimize "rating error"
( X , Y ) = min ∑ M ij ≠ 0 [ M ij − ( X i ) ( Y j ) T ] 2 (X,Y) = min \sum_{M_{ij}\neq0}[M_{ij}-(X_i)(Y_j)^T]^2(X,and )=minMij= 0[Mij(Xi) ( ANDj)T]2
Adding anL2 regularization, we get:
( X , Y ) = min 1 2 ∑ M ij ≠ 0 [ M ij − ( X i ) ( Y j ) T ] 2 + λ 2 ( ∑ i , k ( X ik ) 2 + ∑ j , k ( Y jk ) 2 ) (X,Y) = min\frac{1}{2}\sum_{M_{ij}\neq0}[M_{ij}-(X_i)(Y_j) ^T]^2+\frac{\lambda}{2}(\sum_{i,k}(X_{ik})^2+\sum_{j,k}(Y_{jk})^2)(X,and )=min21Mij= 0[Mij(Xi) ( ANDj)T]2+2l(i,k(XI)2+j,k( andjk)2 )
All we have to do isminimize theabove loss function. This time is anoptimizationprocess. In order to decompose the User matrix and the Item matrix from the score matrix, only the score matrix M on the left is known, and the User matrix and the Item matrix are unknown. In order to learn the User matrix and the Item matrix, the user matrix*Item matrix and the known score difference in the score matrix are minimized(optimization problem)

The matrices X and Y can then be solved by gradient descent, and we can notice two things. First, instead of having to compute gradients for all of the pairs in E at each step, we can consider only a subset of these pairs so that we "batch" optimize our objective function. Second, the values ​​in X and Y don't have to be updated at the same time, and gradient descent can be done alternately on X and Y at each step (in doing so, we consider a matrix to be fixed and optimize before doing the opposite in the next step) another matrix iteration), assuming the following results are obtained:

image-20220808152357549

Once the matrix is ​​decomposed, we can make new recommendations with less information: we can simply multiply the user vector by any item vector to estimate the corresponding rating. Note that we can also use user-user and item-item methods for these new representations of users and items: the (approximate) nearest neighbor search is not done on huge sparse vectors, but on small dense vectors , which makes some approximation techniques more tractable. Take movie recommendations as an example:

Predicted rating of movie i by a user u = inner product of User vector and Item vector

[External link image transfer failed, the source site may have anti-leech mechanism, it is recommended to save the image and upload it directly (img-lRdWehO3-1659963301320) (https://raw.githubusercontent.com/19973466719/jojo-pic/main/img /20220808205436.png)]

Multiplying these two matrices can get each user's predicted score for each movie. The larger the score, the more likely the user likes the movie, and the more the movie is worth recommending to the user.

Finally we can note that this concept of basic decomposition can be extended to more complex models, for example, more general neural networks, the first one we can think of is the Boolean interaction matrix. If we want to reconstruct boolean interactions, a simple dot product doesn't do it well. However, if we add a logistic function on top of this dot product, we get a model that has values ​​in [0, 1] and thus solves the problem better. In this case, the model to be optimized is
min 1 2 ∑ M ij ≠ 0 [ f ( X i , Y j T ) − M ij ] 2 + λ 2 ( ∑ i , k ( X ik 2 ) + ∑ j , k ( Y ik ) 2 ) min\frac{1}{2}\sum_{M_{ij}\neq0}[f(X_i,Y_j^T)-M_{ij}]^2+\frac{\lambda }{2}(\sum_{i,k}(X_{ik}^2)+\sum_{j,k}(Y_{ik})^2)min21Mij= 0[f(Xi,YjT)Mij]2+2l(i,k(XI2)+j,k( andI)2 )
where f(.) is the logistic function. Deeper neural network models often achieve near-state-of-the-art performance (SOTA) in complex recommender systems.

[External link image transfer failed, the source site may have anti-leech mechanism, it is recommended to save the image and upload it directly (img-kKuXnXL8-1659963301321) (https://raw.githubusercontent.com/19973466719/jojo-pic/main/img /20220808205446.png)]

4. Model, Bias, and Variance

Let's focus more on the main differences between the methods mentioned earlier. In particular let's look at the effect on bias and variance.
In memory-based collaborative filtering methods , no latent model is assumed. The algorithm deals directly with user-item interactions: for example, users are represented by their interactions with items, and a nearest-neighbor search on these representations is used to generate suggestions. Since no underlying model is assumed, these methods theoretically have low bias but high variance.
In model-based collaborative filtering methods , some underlying interaction models are assumed. The model is trained to reconstruct user-item interaction values ​​from its own user and item representations. New recommendations can then be made based on this model. The latent representations of users and items extracted by the model have mathematical meanings that are difficult for humans to interpret. Since a model of user-item interactions is assumed, this method theoretically has higher bias but lower variance than methods that assume no underlying model.
Finally, in the content-based approach , some underlying interaction models are also assumed. Here, however, the model provides what defines the user and item representations: for example, a user is represented by a given feature, and we try to model for each item the user feature that likes or dislikes that item. Here, for the model-based filtering approach, a user-item interaction model is assumed. However, the model is more constrained (since the representations of users and items are given), so this method tends to have the highest bias but the lowest variance.

image-20220808163815149

5. Evaluation of recommender systems

As with any machine learning algorithm, we need to be able to evaluate the performance of a recommender system to determine which algorithm is best for our situation. Evaluation methods for recommender systems can be mainly divided into two groups: evaluations based on well-defined metrics and evaluations based mainly on human judgment and satisfaction estimates.

5.1 Evaluation based on evaluation indicators

If our recommender system is based on a model that outputs numerical values , such as rating predictions or match probabilities, we can use an error measure such as mean squared error (MSE) to assess the quality of these outputs in a very classical way. In this case, the model is trained only on part of the available interaction data and tested on the rest of the interaction data.

If our recommender system is based on a model that predicts values , we can also binarize these values ​​(values ​​above the threshold are positive, values ​​below the threshold are negative) and evaluate them "categorically" using the classical thresholding method Model . In fact, since the dataset of user-item past interactions is also binarized (or can be binarized by thresholding), we can evaluate the accuracy of the model's binarized output on the test dataset ( and precision and recall).

Finally, if we now consider a recommender system that is not numerically based , which just returns a list of recommendations (e.g. user-user or item-item based on the knn method), we can still define a precision such as how much the metric estimates really fit our users Proportion of recommended items. To estimate this accuracy, we cannot consider recommended items that our users have not interacted with, we should only consider items in the test dataset for which we have user feedback.

5.2 People-Based Assessments

When designing recommender systems, we are not only interested in obtaining models that produce recommendations that we are very sure of, but we can also expect some other good properties, such as diversity and interpretability of recommendations.
As mentioned in Collaborative Filtering, we want to avoid trapping users in the information-restricted areas we talked about earlier. The concept of "contingency" is often used to express that a model has or does not have a tendency to create such restricted regions (suggested diversity). Occurrence, which can be estimated by calculating the distance between recommended items, should not be too low, as it creates a restricted area, but also should not be too high, as it means that we do not sufficiently consider the user's interests when recommending recommendations** (Explore and Exploit) ** . Exploitation : Show past content of interest. Exploration : Content showing diversity. Therefore, in order to bring diversity in the suggested choices, we want to recommend items that are both very suitable for our users and not too similar. For example, instead of recommending users "Start War 1, 2, 3", "Star Wars 1," "Start Trek Into Darkness," and "Indiana Jones and the Lost Ark Raiders" might be recommended: these two may later be used by us 's system sees fewer opportunities for user interest, but recommending 3 items that look too similar is not a good choice.
Interpretability is another key point for the success of recommendation algorithms. In fact, it has been shown that users tend to lose confidence in recommender systems if they do not understand why they are being recommended for a particular item. So, if we design a model that can be explained clearly, we can add a small sentence when recommending why an item is recommended ("People who liked this item also liked another item", "You like this item, you may be interested in this item" interested"……).
Finally, in addition to the fact that diversity and interpretability are inherently difficult to assess, we can note that assessing the quality of recommendations that do not belong to the test dataset is also very difficult: how do we know if a new recommendation is actually available to our users? Was it relevant before the recommendation? For all these reasons, it is sometimes useful to test models in "real-world conditions". Since the goal of a recommender system is to generate behavior (watching a movie, buying a product, reading an article, etc.), we can indeed assess its ability to generate the expected behavior. For example, the system can be put into production following an A/B testing approach, or it can be tested only on user samples. In testing, a certain level of confidence needs to be set for the model. ( α = 0.05 \alpha=0.05a=0.05 )

6. Summary

First, recommendation algorithms can be divided into two classic models: collaborative filtering-based methods (such as user-user, item-item, and matrix factorization) and content-based methods

Second, memory-based collaborative filtering methods do not assume any latent model and therefore have low bias but high variance; model-based collaborative filtering assumes that a latent interaction model needs to learn user and item features from scratch, and therefore has higher bias but lower variance; content-based approaches assume a latent model is built around user and/or item characteristics and thus has the highest bias and lowest variance

Third, recommender systems are difficult to evaluate: if some classic metrics like MSE, precision, recall, or precision can be used, it should be remembered that we cannot evaluate some properties in this way, such as diversity (contingency) and availability Interpretive; fieldwork ( offline testing, small batch online AB testing, full traffic online ) is ultimately the only real way to evaluate new recommender systems, but requires a certain level of confidence to be set

Fourth, we should note that we have not discussed mixed methods in this introductory article . These methods combine collaborative filtering and content-based methods to achieve state-of-the-art methods in many cases, and are therefore used in many large-scale recommender systems today. Combinations made in hybrid methods can mainly take two forms: we can train two models (a collaborative filtering model and a content-based model) independently, and combine their proposals. Or simply build a model that unifies the two approaches (usually a neural network) by using prior information (about users and items) as well as interaction information as input. thanks for reading! The author has limited knowledge. If there are any mistakes, yours shall prevail, and you are welcome to correct them in the comment area.

References:

Deep Learning Recommendation System. Zhe Wang
https://medium.com/towards-data-science/introduction-to-recommender-systems-6c66cf15ada

Guess you like

Origin blog.csdn.net/weixin_45052363/article/details/126235962