"Recommendation System Practice" - Xiang Liang

Chapter 1 Good Recommendation Systems

1.1 What is a recommendation system

The basic task of the recommendation system is to connect users and items to solve the problem of information overload;

Recommended way:

(1) Social recommendation: Let friends recommend items to you.

(2) Content-based filtering: Recommendations are made by analyzing user past data.

(3) Based on collaborative filtering (collaborative filtering): Recommendations are made through their data.

1.2 Application of personalized recommendation system

1.2.1 E-commerce:

Amazon: About 20-30% of Amazon's sales come from recommendation systems

Personalized recommendation list-item-based recommendation [recommendation based on what you have bought and friend relationship],

Related recommendation list [purchased items by other users who have also bought];

1.2.2 Movie and Video Websites

netflix [recommendation based on movies you have watched], YouTube

1.2.3 Personalized Music Internet Radio

    个性化推荐的成功应用需要两个条件。第一是存在信息过载,第二是用户大部分时候没有特别明确的需求

Overseas Pondora [mark the characteristics of the song, calculate the similarity of the song according to the standard, and recommend music with high similarity],

Douban Radio in China

1.2.4 Social Networks

Application of personalized recommendation technology:

Use the user's social network information to make personalized item recommendations for users

Session recommendation for information flow [each share and all its comments are called a session]

Recommend friends to users

The most valuable data of Facebook: the user's direct social network relationship, user's preference information

1.2.5 Personalized reading

Google Reader: Users who follow people they are interested in can read articles shared by the users they follow

Zite: Collect user preference information on articles, analyze user feedback data and constantly update user's personalized article list

Digg: Calculate the user's direct interest similarity based on user history, and recommend articles that users with similar interests like

1.2.6 Location-Based Services

Obtain the user's location to recommend services that are close and interesting

1.2.7 Personalized mail

Improve user productivity by analyzing the user's historical behavior and habits of reading emails to reorder new emails

1.2.8 Personalized Ads

Personalized advertising technology:

Contextual advertising: analyze the content of the webpage that the user is browsing, and deliver relevant advertisements

Search advertising: analyze the user's search history in the current session, and deliver advertisements related to the search purpose

Personalized display ads: According to user interests, different display ads are delivered to different users【Yahoo】

1.3 Recommendation system evaluation

A complete recommendation system has 3 participants: users, item providers, and websites that provide recommendation systems

1.3.1 Recommendation system experimental method

1. Offline experiment: Obtain user behavior data through logs, generate standard data sets, and make predictions here

2. User survey: ask users directly

3. Online experiment: AB test

****

1.3.2 Evaluation indicators

1. User Satisfaction

2. Prediction accuracy

3. Coverage

4. Diversity

5. Novel nature

6. Surprise

7. Trust

8. Real-time

9. Robustness

10. Business goals

11. Summary

1.3.3 Evaluation dimensions

User Dimension, Item Dimension, Time Dimension

Chapter 2 Leveraging User Behavior Data

2.1 Introduction to User Behavior Data

A variety of original logs are aggregated into session logs according to user behaviors, and after aggregation, session logs describing user behaviors are generated and various behaviors of users are recorded.

User behavior in personalized recommendation is divided into: explicit feedback behavior, invisible feedback behavior

Dataset: implicit feedback dataset without context information, explicit feedback dataset without context information, implicit feedback dataset with context, explicit feedback dataset with context

2.2 User Behavior Analysis

2.2.1 Distribution of user activity and item popularity

The data distribution satisfies the distribution of Power Law, also known as the long-tail distribution. The frequency of occurrence is inversely proportional to the constant power ranked in the popular rankings, indicating that the frequency of most words in English is actually very low, and only a few words are used. frequently used

****

2.2.2 The relationship between user activity and item popularity

It is generally believed that new users tend to browse popular items, and old users will gradually start browsing unpopular items

2.3 Experimental Design and Algorithm Evaluation

2.3.1 Dataset: Score Dataset

2.3.2 Experimental design:

2.3.3 Evaluation indicators

2.4 Neighborhood-based algorithms

**2.4.1 User-based collaborative filtering algorithm: **Recommend items to the user that are similar to those that other users like

(1) Find a set of users with similar interests to the target user

(2) Find items that the users in this collection like, and that the target user has not heard of, and recommend them to the target user

**2.4.2 Item-based collaborative filtering algorithm: **Recommend items similar to the items he liked before****

(1) Calculate the similarity between items

(2) Generate a recommendation list for the user based on the similarity of the item and the user's historical behavior

2.4.3 Comprehensive comparison of UserCF and ItemCF

UserCF: Recommend items that are liked by users who have the same interests as him, which is more social

ItemCF: Focus on maintaining the user's historical interest and make it more personalized

****

2.5 Latent Semantic Model

2.5.1 Basic algorithm

The analysis technology comes from the cloud's statistics on user behavior,

2.5.2 Examples of LFM-based Practical Systems

2.5.3 Comparison of LFM and neighborhood-based methods

LFM is a method based on machine learning and has a relatively good theoretical basis. When generating a recommendation list, it is necessary to calculate the user's interest weight for all items. It is not suitable for a system with a very large number of items and does not support online real-time recommendation.

Neighborhood-based (such as UserCF, ItemCF) methods: need to maintain an offline correlation table to support good recommendation explanations

2.6 Graph-based models

2.6.1 Bipartite graph representation of user behavior data

Represent user behavior data in the form of a graph, consisting of a series of binary groups

2.6.2 Graph-based recommendation algorithm

The task of recommending items can be transformed into measuring the correlation of item nodes on the graph that are not directly connected between user vertices. Items with higher correlation have higher weights in the recommendation list.

Chapter 3 Recommendation System Cold Start Problem

3.1 Introduction to cold start problem

Cold start problem type:

User cold start: New users cannot obtain interests and cannot make personalized recommendations

Item Cold Start: New Item Recommendations

System cold start: Design a personalized recommendation system on the newly developed website

Solution:

Provide non-personalized recommendations

Log in with the user's social network account (requires user authorization)

Ask users to give feedback on some items when they log in

Newly added items can use content information to recommend to users who have liked the type of item

The introduction of experts at the cold start of the system is just to establish the relevance of items in a certain funny way

3.2 Using user registration information

(1) Get the user's registration information

(2) Classify users according to user registration information

(3) Recommend to the user the items that the user likes in the category he belongs to

3.3 Select the right item to activate the user's interest

When visiting the recommendation for the first time, provide some items to the user, let the user feedback the interest in these items, and remember the recommendation according to the feedback, the system

Provide items required:

Refueling Representation and Distinction

Starting an item collection requires variety

3.4 Using content information of items

The problem to be solved in product cold start is how to recommend newly added items to users who are interested in it, which is very important in time-sensitive websites such as news websites.

The UserCF algorithm is not very sensitive to the cold start of items. The UserCF algorithm needs to solve the problem of the first push force: where does the first user discover new items?

The principle of the ItemCF algorithm is to recommend items to the user that are similar to the items he liked before. When a new item is added, this record will not exist in the item related table in the memory, so new items cannot be recommended

The content of an item can be represented by a vector space model, which will represent the item as a keyword vector, and some other important words to form a keyword set, and form a keyword vector for keyword ranking.

3.5 Playing the role of experts

When the recommendation system is established, in order to allow users to get a better experience, many systems use experts to mark; through the combination of experts and machine learning, the problem of system cold braking is solved.

Chapter 4 Leveraging User Label Data

Contact users and items through some features, and recommend items with features that users like to users.

4.1 Representative Applications of UGC Labeling System

4.1.1  Delicious

Allows users to tag every web page on the Internet, thereby reorganizing the entire Internet through tags.

4.1.2  CiteULike

A well-known paper tagging website allows researchers to submit or bookmark papers they are interested in and tag them.

4.1.3  Last.fm

The music website uses the analysis of the user's listening behavior to predict the user's interest in music, so as to recommend personalized music to the user.

4.1.4 Douban

Allow users to tag books and movies to obtain content information and semantics of books or movies, and use this information to improve recommendations.

4.1.5  Hulu

It is a well-known video website in the United States, which introduces user tags to mark TV series and movies.

4.2 Recommendation Problems in Labeling Systems

4.2.1 Why do users mark

(1) Social dimension: for content uploaders - to facilitate uploaders to organize their own information; for users - to facilitate and help other users find information;

(2) Functional dimension: users can better convey information, which is convenient for users to search in the future; it is used to convey certain information, such as the location and time of taking photos.

4.2.2 How users tag

The popularity distribution of tags also presents a very classic long-tail distribution.

4.2.3 What kind of tags do users place

What is the table name item, type, who owns it, user views, user-related tags, user-related tasks

4.3 Tag-based recommendation system

4.3.1 Experimental setup

Randomly divide the data set into 10 parts, and the key values ​​of the division are users and items.

4.3.2 A simplest algorithm

Count the most frequently used tags of each user, what items each tag is most used on, find the most frequently used tags of a user and then find the most popular items with this tag, and recommend the item to this user

4.3.3 Algorithm improvement

1. TF-TDF: recommend popular items to users, reduce the novelty of recommendation results, learn from the idea of ​​TF-TDF, and improve the algorithm

2. Data sparsity:

3. Label cleaning

4.3.4 Graph-based recommendation algorithm

Represent the user's tagging behavior on a graph, and the three elements are users, items and tags

4.3.5 Label-based recommendation explanation

The biggest advantage is to use tags for recommendation explanation; it makes users feel that the tag cloud is reasonable, and it is also reasonable to recommend certain items from a certain tag.

4.4 Recommend tags to users

4.4.1 Why recommend tags to users

(1) It is convenient for users to input tags

(2) Improve label quality

4.4.2 How to recommend tags to users

(1) Recommend the most popular tags in the entire system

(2) Recommend the most popular tag on item i to the user

(3) Recommend to the user his own frequently used tags

(4) Combination of the first two: the above recommendation results are linearly weighted by coefficients

4.4.3 Experimental setup

4.4.4 Graph-based tag recommendation algorithm

4.5 Further reading

Chapter 5 Using Contextual Information

5.1 Temporal context information

5.1.1 Introduction to time effects

(1) User interests are changing, and will change with changes in age, environment, etc.

(2) Items also have a life cycle, such as the life cycle of news is short, and the life cycle of movies is long

(3) Seasonal effect: reflects the change of time on user interest

5.1.2 Examples of time effects

5.1.3 Analysis of system time characteristics

After the time information is given, the recommendation system changes from a static system to a time-varying system, and user data also becomes a time series

(1) The growth of the number of independent users per day in the data set

(2) Item changes in the system

(3) User access

5.1.4 Real-time performance of recommendation system

It is necessary to respond to new behaviors of users in real time, so that the recommendation list is constantly changing to meet the interests of users

5.1.5 Time Diversity of Recommendation Algorithms

The daily recommendation result change of the recommendation system is defined as the time diversity of the recommendation system; first, ensure that the system can adjust the recommendation results in time after the user has a new behavior. In fact, it is necessary to ensure that the result can be changed frequently when the user has no new behavior. , with temporal diversity.

5.1.6 Time context recommendation algorithm

1. The most popular recently

2. ItemCF algorithm related to time context: item similarity, online recommendation

3. UserCF algorithm related to time context: user interest similarity, recent behavior of users with similar interests

5.1.7 Time period diagram model

5.1.8 Offline experiment

5.2 Location context information

As an important spatial feature, location is also an important contextual information. Users in different regions have different interests, and users have different interests when they go to different places. When you enter a certain place and want to find delicious food, the first factor to consider may be the distance.

Location-based recommendation algorithm:

The system first divides items into two categories, one with spatial attributes, such as restaurants and shops; the other with no spatial attributes, such as books or movies; users are also divided into those with or without spatial attributes.

LARS discovers two characteristics related to user interests and locations by studying the data set:

(1) Interest localization: users in different places are very different

(2) Activity localization: a user tends to be active in nearby areas

5.3 Further reading

Chapter 6 Leveraging Social Network Data

6.1 Ways to obtain social network data

6.1.1 Email

Information is available through the suffix of the e-mail number, as a company.

6.1.2 User registration information

Some injected company, school and other information filled in during registration.

6.1.3 User's location data

Location information is data that reflects the social relationship of users. For example, users in the same dormitory building or the same company may have friendly relationships.

6.1.4 Forums and Discussion Groups

Being in the same forum or discussion group shows that two people are familiar with each other or have similar interests.

6.1.5 Instant Messaging Tools

The contact list and list group information of the chat tool can know the social network relationship of the user, and the familiarity between the users can be known through the frequency of chatting between the user relationships.

6.1.6 Social networking sites

Users can create a public page on social networking sites to introduce themselves, and social networking sites naturally alleviate the problem of information overload. Users can filter information for themselves through friends, better find friends with similar interests, and find their own interests faster. content of interest.

1) Social map and interest map

2)

6.2 Introduction to social network data

A social network defines the connection between users, so a graph can be used to define a social network, and a social network is divided into:

Two-way confirmation of social network data

One-way social network data

Community-Based Social Networking Data

The distribution of in-degree and out-degree of users in social networks also satisfies the long-tail distribution.

6.3 Recommendation based on social network

Advantages of social recommendation:

Friend recommendation can increase the trust of recommendation

Social networks can solve the cold start problem

6.3.1 Neighborhood-based social recommendation algorithm

Recommend to the user a collection of items that friends like.

6.3.2 Graph-based social recommendation algorithm

A user's social network can be represented as a social network graph,

6.3.3 Social recommendation algorithm in actual system

(1) Two truncations, only the set with the highest similarity to the user is taken out, and all the friends of the user are not taken out

(2) Redesign the database: such as Weibo, send a Weibo, and the message queues of all people who follow this person need to be updated

6.3.4 Social Recommendation System and Collaborative Filtering Recommendation System

Researchers Evaluate Social Recommender Systems Using User Surveys and Online Experiments

6.3.5 Information flow recommendation

Mainly for Twitter and Facebook these two social networking sites. The problem to be solved in the personalized recommendation of information flow is how to further help users select useful information from the information wall

6.4 Recommend friends to users

The purpose is to recommend new friends to users based on existing friends and user behavior records, thereby increasing the density of the entire social network and the activity of users of social networking sites.

6.4.1 Content-based matching

Recommend users with similar content attributes as friends

6.4.2 Friend recommendation based on common interests

Recommend other users with common interests as friends

6.4.3 Friend Recommendation Based on Social Network Graph

Recommend new friends to the user based on the existing social network, such as recommending friends of friends to the user.

6.4.4 Comparison of Friend Recommendation Algorithms Based on User Survey

6.5 Further reading

Principle of Six Degrees: Any two people in society can get to know each other through paths of no more than six people.

Chapter 7 Recommended System Examples

7.1 Peripheral Architecture

The recommendation system relies on interface display and user behavior data, how to collect and store user data:

Data collection and storage: Depending on the scale of the data and whether real-time access is required, different behavioral data will be stored in different media.

7.2 Recommender System Architecture

Feature-Based Recommender System Architecture

(1) Demographic characteristics

(2) Behavioral characteristics of users

(3) User's Hungry Topic Features

7.3 Architecture of recommendation engine

7.3.1 Generating User Feature Vectors

(1) Information that can be extracted from the user's registration information, mainly including the user's demographic characteristics

(2) Calculated from user behavior

The following factors need to be considered in the calculation of user behavior: the type of user behavior, the time when user behavior occurs, the number of user behavior, and the popularity of items.

7.3.2 Features? Recommendations related to items

7.3.3 Filter module

(1) The user has already generated behavior items

(2) Items other than candidate items

(3) Certain items of poor quality

7.3.4 Ranking Module

(1) Novelty ranking

(2) Diversity

(3) Temporal diversity

(4) User feedback

7.4 Further reading

Chapter 8 Rating Prediction Problems

8.1 Offline experimental method

8.2 Score Prediction Algorithm

8.2.1 Average

8.2.2 Neighborhood-based methods

8.2.3 Latent Semantic Model and Matrix Factorization Model

8.2.4 Add time information

8.2.5 Model Fusion

8.2.6 Related experimental results of Netflix Prize

10 lessons learned

1. Make sure you really need a recommendation system. Don’t make a recommendation system just for the sake of making a recommendation system. You should start from the user’s point of view;

2. Determine the relationship between business goals and user satisfaction. A good recommendation system for users does not mean a commercially useful recommendation system, and user satisfaction is always in the long-term interests of the enterprise.

3. Choose the right developer.

4. Forget about the cold start problem. Innovate constantly.

5. Balance the relationship between data and algorithms. Using correct user data is crucial for recommender systems.

6. Finding relevant items is easy, but when and how to present them to users is difficult. Don't recommend for the sake of recommending.

7. Don't waste time counting users with similar interests, you can directly use social network data.

8. It is necessary to continuously improve the scalability of the algorithm

9. Choose the appropriate user feedback method.

10. Design a reasonable evaluation system, and always pay attention to the performance of all aspects of the recommendation system.

Guess you like

Origin blog.csdn.net/weixin_44934104/article/details/130094249
Recommended