92 Recommendation Algorithms - Similarity Recommendation and Collaborative Filtering

1 Similarity-based recommendation process

insert image description here

How user preferences are collected

insert image description here

How user preferences are integrated

In most cases, we extract more than one user behavior. There are basically two ways to combine these different user behaviors:

Different behavior groups
can generally be divided into "view" and "purchase", etc., and then calculate different user/item similarities based on different behaviors.

weighted operation

We weight them according to the degree to which different behaviors reflect the user's preference, and obtain the user's overall preference for the item. Generally speaking, explicit user feedback is larger than implicit weights, but it is relatively sparse. After all, there are only a few users who display feedback; at the same time, compared with “viewing”, “purchasing” behavior reflects user preferences to a greater extent, such as : "Follow", "Add to shopping cart", "Cancel after placing an order", "Payment" reflect the different preferences of users. But this also varies by application.

2 How to calculate similarity

The user behavior data is collected, and after a certain preprocessing and normalization, a two-dimensional matrix of user preferences is obtained, one dimension is the user list, the other dimension is the item list, and the value is the user's preference for the item, generally [0 ,1] or a floating point value of [-1, 1]

In the two-dimensional matrix of user-item preferences, we can use a user's preference for all items as a vector to calculate the similarity between users, or use all users' preferences for an item as a vector to calculate the similarity between items similarity between

Similarity and distance are opposite concepts by definition

insert image description here

insert image description here

Commonly used similarity calculation

insert image description here
insert image description here
insert image description here
insert image description here
are both 0 and S are irrelevant numbers

insert image description here

3 Collaborative Filtering

Collaborative filtering is a typical way to exploit collective intelligence.

Collaborating (Collaborating) is a group behavior, filtering (filtering) is an individual behavior.

Collaborative filtering is based on the following basic assumption: if a person A holds the same opinion as another person B on an issue, then
A is more likely to hold the same opinion with B on another issue than a randomly selected passerby A.

In life, if you want to go to the movies, you usually ask friends around you to see what good movies have been recommended recently. We generally prefer to get recommendations from friends with similar tastes. This is the core idea of ​​collaborative filtering.

Collaborative filtering recommendation algorithm is the earliest and more famous recommendation algorithm, which can be divided into two categories:

Item-based CF:
• Evaluate the similarity between items through the user's ratings of different items, and make recommendations based on the similarity between items
• Clustering of things

User-based CF:
• Evaluate the similarity between users through the ratings of different users on items, and make recommendations based on the similarity between users
• People are divided into groups

Association Rules vs Collaborative Filtering

Association Rules:
• Answer the question: If a consumer buys commodity A, what other commodities might he buy?
• Features: Direct recommendations, mining potential associations from overall data, independent of individual preferences

Collaborative Filtering:
• Answer the question: Who are the customer groups similar to Customer A? Recommend items in the customer base to A (items that A does not have); what is the group of items similar to A's item?
For a user, if he has purchased or collected A, recommend the user's item group similar to A

• Features: indirect recommendation, that is, find similar people (user based) or items (item based) first, and then recommend based on the preferences of people with similar tastes
• Collaborative filtering = collaborative (part of collective intelligence) + filtering (for specific Someone did another personalization)

4 User-based collaborative filtering algorithm

User-based collaborative filtering evaluates the similarity between users through the ratings of different users on items, and makes recommendations based on the similarity between users

A simple summary: Recommend items that other users like with similar interests to the user

insert image description here
User-based Collaborative Filtering Algorithm—Application

As shown in Figure 1, the matrix is ​​a list of user ratings of products, the horizontal axis is the product (book), the vertical axis is the user, and the corresponding value is the user's rating of the product, the highest is 5 points, the lowest is 1 point, the higher the rating, the more the user The more the product is liked, the more empty it means that the user has not rated the product.

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
The basic idea of ​​user-based collaborative filtering is very simple. Based on the user's preference for items, it finds adjacent neighbor users, and then recommends the neighbor users to the current user.

Computationally, it is to use a user's preference for all items as a vector to calculate the similarity between users. After finding similar neighbors, according to the similarity weight of the neighbors and their preference for items, predict the current user's preference for items that are not involved. The preference is calculated to get a sorted list of items as recommendations.

The figure shows an example. For user A, according to the user's historical preference, only one neighbor user C is calculated here, and then the item D that user C likes is recommended to user A.

5 Item-based collaborative filtering algorithm

Item-based collaborative filtering evaluates the similarity between items through user ratings of different items, and makes recommendations based on the similarity between items.

In a nutshell: Recommend items to the user that are similar to items he liked before.

insert image description here

insert image description here
insert image description here
insert image description here
The principle of item-based is similar to that of user-based, except that the item itself is used when calculating neighbors, rather than from the user's point of view, that is, based on the user's preference for items to find similar items for recommendation

From a computational point of view, it is to use the preferences of all users for an item as a vector to calculate the similarity between items. After obtaining similar items of the item, predict the item that the current user has not expressed preference according to the user's historical preference. Calculate a sorted list of items as recommendations

The above figure shows an example. For item A, according to the historical preferences of all users, users who like item A like item C. It is concluded that item A and item C are relatively similar, and user C likes item A, then it can be inferred that User C may also like item C

User-based CF vs Item-based CF

For recommendation websites of e-commerce and movie music, considering that the number of users generally greatly exceeds the number of products, the computational complexity of item-based collaborative filtering recommendation is relatively low, and it is a widely used recommendation algorithm.

User-based collaborative filtering is more suitable for occasions where items such as news recommendations grow rapidly and have high real-time performance.

For social networking sites such as Weibo, on the one hand, considering the quantity and update speed of the content, user-based collaborative filtering is a better choice. Together with social network information, it can increase the user's confidence in the recommended explanation.

6 Example: Item-based collaborative filtering

Case Background
The website provides personalized movie recommendation services for Internet users
Data:
The website provides all movie information
The website collects user behavior data, including browsing, rating and commenting and other behavioral data
Target
websites help users find unwatched and match his interests list of movies

data sample

Test data set: • 3 fields per row, followed by user ID, movie ID, and user's rating of the movie (0-5 points, every 0.5 points is a rating point)

insert image description here

Implementation steps

  1. Build a co-occurrence matrix for items
  2. Build a matrix of user ratings for items
  3. Matrix calculation recommendation results

Step 1: Build the co-occurrence matrix of items

insert image description here

Step 2: Build a matrix of user ratings for items

insert image description here

Step 3: Matrix calculation recommendation result

Co-occurrence matrix * scoring matrix = recommendation result
Collaboration (collective action) + filtering (for personal preference) = personalized recommendation

insert image description here

7 Advantages and disadvantages of collaborative filtering

• Advantages
• Ability to filter information that is difficult to perform automatic content analysis by machines, such as artwork, music, etc.
• Share the experience of others, avoiding incomplete and imprecise content analysis
• Ability to recommend new information. Information that is completely dissimilar in content can be found, and users' potential interests and preferences that have not been discovered by themselves can be found.

• Disadvantages
• Cold start problem
• Data sparsity problem
• Interference from popular items, high similarity between the most popular items in two different fields

8 Case: Collaborative Filtering

Comparison of user/item-based collaborative filtering schemes for movie rating and recommendation

Content-Based Recommendations

Recommendations based on product attributes or user-defined tags

insert image description here
Content-Based Recommendations

insert image description here
insert image description here

Content-based recommendation - based on tags
• The basic idea of ​​recommending items based on tags is to find some tags commonly used by users, then find popular items with these tags, and recommend them to users

• Issues to pay attention to:
• One is to ensure novelty and diversity, the TF-IDF method can be used to reduce the weight of popular items

• Another is to remove some synonymous duplicate tags and meaningless tags

Advantages and disadvantages of content-based recommendation methods

• Advantages
• No user behavior data required, no cold starts and sparse data issues.
• Can make recommendations for users with special interests.
• Can recommend new or not very popular items, no new item issues.
• Through the content characteristics of the recommended items, the reason for the recommendation can be given.
• There are relatively mature classification learning models that can be applied.

• Disadvantages
• Requires meaningful features to be extracted from content
• Requires content to be structured
• Requires users' interests and preferences to be expressed through features

9 Hybrid Recommendation Methods

Weighted Hybrid Recommendation

• The candidate results and the scores of the results generated by different recommendation algorithms are further combined (Ensemble) weighted to generate the final recommendation
ranking result

• Hierarchical Hybrid Recommendation
• According to different recommendation scenarios, different recommendation algorithms are hierarchically divided according to their effects. In the corresponding recommendation scenario,
the results generated by the recommendation algorithm with high reliability are preferred

• Cross-hybrid recommendation
• Combine the generated results of different recommendation algorithms according to a certain ratio, package them and present them to users in a centralized manner to ensure the
diversity of the final recommendation results

10 Case: Design of Weibo Recommendation System

insert image description here

Key Issue - Cold Start

How to make personalized recommendations for new users

• Coarse-grained recommendations based on their registration information, such as age, gender, hobbies, etc.

• Provide new users with some content after they sign up, let them report their interest in the content, and then make recommendations based on this data

insert image description here

• How to recommend new items to users
• Randomly show new items to users in the recommendation list, and accumulate exposure opportunities and behavior data for new items

• Extract keywords and give weights to items through semantic analysis. This content feature is similar to a vector, and the similarity
between items can be obtained through the cosine similarity between vectors, so as to recommend

• How to make personalized recommendations for new websites when data is scarce
• Human-based power, such as manual editing of popular lists, manual classification and annotation

• Relying on third-party data, such as linking social network accounts, importing contact information, and scraping social content

Key Question - Time Factor

For news information recommendation, the time weight is very important. You can add a time decay factor to each recommended item, and assign a smaller weight to the older items

• For Item CF, when looking for similar items, it focuses on the items that the user likes recently. • For User CF, if two users like the same item at the same time, they can be given a higher degree of similarity while recommending items. , you can also focus on recommending items that users with similar tastes like recently.

• Give a certain weight to user behavior based on time during similarity calculation. The longer the time interval, the lower the weight. After this improved algorithm, users can often get more satisfactory results.

Key Issue - Data Sparsity

Considering the scale of Internet users and products, the data is very sparse, and the effect of direct application of collaborative filtering is not very good.

• In essence, this problem cannot be completely overcome, and can only be alleviated to a certain extent.
• Whether the data is related is more important than the user's preference.
• Practice has shown that in the case of sparse data, the same product is given a negative score (low evaluation) and a positive score. Two users with positive scores should be regarded as positively correlated rather than negatively correlated

• Based on the assumption that similarity can be propagated, through the diffusion algorithm, from the original first-order association (how much similarity two users have or what products they buy together) to second-order or even higher-order associations, this method compares the amount of computation. Big

• Item-based data can be very sparse. If the item information is coarse-grained, the data will become dense at the category level. If the similarity between categories can be calculated, it can help with category-based recommendations

11 Recommendation algorithm based on matrix factorization (SVD)

• Definition of Singular Value Decomposition (SVD):

• Any MxN matrix A, if its number of rows M is greater than or equal to its number of columns N, then it can be transformed into the product of the following three matrices: an MxR orthogonal matrix U, an RxR diagonal matrix matrix (singular value matrix), and a transpose of an RxN orthogonal matrix V

• For a rating matrix MxN consisting of M products and N users, we can decompose the matrix into MxR matrix U, RxR singular value matrix and RxN matrix V • Take the first few singular values ​​of the S matrix It is possible to approximate the original MxN matrix without having to take the full S

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/weixin_44498127/article/details/124421592