Recommendation system study notes 1

What is a recommender system

When you encounter the problem of information overload , you need a person or tool to help you filter, and give some suggestions for you to choose. This tool is a personalized recommendation system.

 

    The task of the recommender system is to contact users and information, on the one hand, to help users find

Valuable information, on the other hand, allows information to be displayed in front of users who are interested in it, so as to achieve a win-win situation for information consumers and information producers .

 

To solve the problem of information overload, the representative solutions are catalogs and search engines. Like search engines, recommender systems are tools that help users discover useful information quickly. Unlike search engines, recommender systems do not require users to provide explicit needs, but model users’ interests by analyzing users’ historical behaviors, so as to proactively recommend information to users that can meet their interests and needs.

Therefore, in a sense, recommender systems and search engines are two complementary tools for users. Search engines meet the active search needs of users when they have a clear purpose, and recommender systems can help users discover new content of interest when they have no clear purpose.

 

From the point of view of items, recommender systems can better exploit the long tail of items . Chris Anderson , editor-in-chief of Wired magazine , published the article " The Long Tail " in 2004 and the book "The Long Tail Theory" in 2006 . The book pointed out that the traditional 80/20 principle ( 80% of sales from 20% of popular brands) will be challenged with the addition of the Internet. Under Internet conditions, e-commerce sites can often sell more products than traditional retail stores due to extremely low shelf costs. Although the vast majority of these items are not popular, the number of these less popular items is extremely large compared to traditional retail, so the total sales of these long-tail items will be a significant number, perhaps surpassing the popular items (i.e. mainstream merchandise). Mainstream products often represent the needs of the vast majority of users, while long-tail products often represent the personalized needs of a small number of users.

Therefore, if you want to increase sales by exploring the long tail, you must fully study the interests of users, and this is the problem that personalized recommendation systems mainly solve. The recommendation system finds the user's personalized needs by exploring the user's behavior, so as to accurately recommend the long-tail products to the users who need it, and help the users to find those products that they are interested in but difficult to find.

How recommender systems work

Consult with friends : This method is called social recommendation in the recommendation system , that is, asking friends to recommend items to themselves.

The recommendation system can automate the above process, find the actors and directors that the user likes by analyzing the movies that the user has watched, and then recommend other movies of these actors or directors to the user. This recommendation method is called content -based filtering in recommendation systems.

If you can find a group of users with similar historical interests and see what movies they've been watching recently, the results may match your interests better than a broad top chart. This approach is called recommendation based on collaborative filtering .

 

The essence of recommendation algorithms is to connect users and items in a certain way, and different recommendation systems utilize different ways. The current recommendation system is a tool that automatically connects users and items. It can help users discover information that interests them in an information overload environment, and can also push information to users who are interested in them.

 

The main function of the personalized recommendation system in these websites is to improve the click rate and conversion rate of the website by analyzing a large number of user behavior logs and providing different personalized page displays for different users. Areas where recommender systems are widely utilized include e-commerce, movies and video, music, social networking, reading, location-based services, personalized email and advertising, and more. Almost all recommendation system applications are composed of three parts : the display page in the foreground, the log system in the background, and the recommendation algorithm system .

Personalized recommendation technology in social networks is mainly used in three aspects:

 1Using the user's social network information to make personalized item recommendation for the user;

Session recommendation for 2 information streams;

 3 Recommend friends to users.

 

Reading articles is something that many internet users do every day. Personalized reading also complies with the two factors that need personalized recommendation mentioned above: first, there are many articles on the Internet, and users face the problem of information overload; second, users often do not have to read a specific article. Just trying to understand what's going on in those areas by reading articles about those areas.

At present, there are many personalized reading tools on the Internet, such as Google Reader , which is well-known internationally, and Xianguo.com in China. At the same time, with the popularity of mobile devices, there are many applications for personalized reading on mobile devices, among which Zite and Flipboard are representative .

Google Reader is a popular social reading tool. It allows users to follow people they are interested in and then see articles shared by the users they follow. Unlike Google Reader , Zite , a personalized reading tool , collects information about user preferences for articles. On the right side of each article, Zite allows users to give feedback on likes or dislikes, and then continuously updates the user's personalized article list by analyzing the user's feedback data. Zite was a huge success after its launch and was later acquired by CNN .

 

The difference between personalized advertising and narrow personalized recommendation is that personalized recommendation focuses on helping users find items that may interest them, while advertising recommendation focuses on helping ads find items that may be of interest to them. Users, that is, one is user-centric, and the other is advertising-centric. The current personalized advertising technology is mainly divided into three types.

Contextual advertising By analyzing the content of the web page users are browsing, we can place advertisements related to the content of the web page. The representative system is Google's Adsense .

Search ads By analyzing the user's search records in the current session, judging the user's search purpose, and placing advertisements related to the user's purpose.

Personalized Display Ads We often see a large number of display ads (that is, those big banner images) on many websites, and they are based on the interests of users and serve different display ads to different users. Yahoo is representative of this research.

 

Recommendation system evaluation

What makes a good recommendation system? This is the primary problem that the recommendation system evaluation needs to solve. A complete recommender system generally has three participants: the user, the item provider, and the website that provides the recommender system .

A good recommender system can not only accurately predict the user's behavior, but also expand the user's horizons and help users discover things that they may be interested in, but are not so easy to find. At the same time, the recommender system should also be able to help merchants introduce those good products buried in the long tail to users who may be interested in them .

From different perspectives, indicators include accuracy, coverage, novelty, surprise, trust, transparency, etc.

 

In the recommendation system, there are mainly three experimental methods for evaluating the recommendation effect , namely offline experiment , user study and online experiment .

1. Offline experiment

The offline experiment method generally consists of the following steps:

(1) Obtain user behavior data through the log system, and generate a standard data set according to a certain format;

(2) Divide the data set into training set and test set according to certain rules;

(3) Train the user interest model on the training set, and make predictions on the test set;

(4) Evaluate the prediction results of the algorithm on the test set by the pre-defined offline benchmarks.

The advantage of this experimental method is that it does not require the participation of real users, and can be calculated directly and quickly, so that a large number of different algorithms can be tested easily and quickly. Its main disadvantage is that it cannot obtain a lot of business-focused indicators, such as click-through rate, conversion rate, etc., and it is also difficult to find offline indicators that are very related to business indicators.

 

2 User research needs to have some real users and let them complete some tasks on the recommender system that needs to be tested. As they complete their tasks, we need to observe and record their behavior and have them answer some questions. Finally, we need to understand the performance of the test system by analyzing their behavior and answers, user surveys also have some drawbacks. First, user surveys are expensive and require users to spend a lot of time completing tasks and answering relevant questions. In some cases, it is also necessary to pay to hire test users. Therefore, it is difficult to conduct large-scale user surveys in most cases, and for user surveys with a small number of participants, many conclusions are often not statistically significant. Therefore, when we conduct user surveys, we must control costs on the one hand, and ensure the statistical significance of the results on the other hand.

In addition, test users are not chosen at random. Its advantage is that many indicators that reflect the subjective feelings of users can be obtained, and the risk is relatively low compared to online experiments, and it is easy to make up for mistakes. The disadvantage is that it is expensive to recruit test users, and it is difficult to organize large-scale test users, so the statistical significance of the test results will be insufficient.

 

3. Online experiment

After completing offline experiments and necessary user surveys, the recommender system can be put online for AB testing to compare it with the old algorithm.

AB testing is a very commonly used experimental method for online evaluation algorithms. It randomly divides users into several groups through certain rules, adopts different algorithms for different groups of users, and then compares different algorithms by counting various evaluation indicators of different groups of users, such as the click-through rate of different groups of users. Compare the performance of different algorithms by click-through rate. Readers who are interested in AB testing can browse the website http://www.abtests.com/ , which gives many examples of improving website user satisfaction through actual AB testing, from which we can learn how to conduct reasonable AB tests. test. The advantage of AB testing is that it can fairly obtain performance indicators of different algorithms when they are actually online, including indicators of commercial concern. The disadvantage of AB test is mainly that the cycle is relatively long, and long-term experiments must be carried out to obtain reliable results.

 

Evaluation indicators

1. User satisfaction

As an important participant in the recommender system, users' satisfaction is the most important indicator for evaluating the recommender system. However, there is no way to calculate user satisfaction offline, and it can only be obtained through user surveys or online experiments.

User surveys to obtain user satisfaction are mainly in the form of questionnaires

2. Prediction accuracy

Prediction accuracy measures the ability of a recommender system or recommendation algorithm to predict user behavior. This indicator is the most important offline evaluation indicator for recommender systems. This indicator can be calculated by offline experiments. When calculating this metric, an offline dataset is required, which contains the user's historical behavior records. Then, the

The dataset is split into training and test sets by time. Finally, the user's behavior on the test set is predicted by establishing the user's behavior and interest model on the training set, and the coincidence of the predicted behavior and the actual behavior on the test set is calculated as the prediction accuracy.

Many sites that offer recommendation services have a feature that lets users rate items. Then, if you know the user's historical rating of an item, you can learn the user's interest model from it, and predict how much the user will rate an item when he sees an item that he has not rated in the future. The act of predicting a user's rating for an item is called rating prediction. The prediction accuracy of rating predictions is generally calculated by means of root mean square error ( RMSE ) and mean absolute error ( MAE ).

 

When websites provide recommendation services, they generally give users a personalized recommendation list, which is called TopN recommendation. The prediction accuracy recommended by TopN is generally measured by precision / recall .

TopN recommendations are more in line with actual application needs.

 

3. Coverage

Coverage describes the ability of a recommender system to discover the long tail of items. There are different ways to define coverage. The simplest definition is the proportion of items that the recommender system can recommend to the total item set. , coverage is a metric that content providers will care about

4. Diversity

The interests of users are extensive. In a video website, users may like to watch cartoons such as "Tom and Jerry" and also like to watch Jackie Chan's action movies. Then, in order to satisfy the broad interests of users, the recommendation list needs to be able to cover different areas of interest of users, that is, the recommendation results need to be diverse. The benefit of a diversity recommendation list is, as the saying goes, "not hanging on a tree." Diversity describes the dissimilarity between pairs of items in the recommendation list. Therefore, diversity and similarity are corresponding

5. Novelty

Novel recommendations refer to recommending items to users that they have not heard of before. The easiest way to achieve novelty in a website is to filter items from the recommendation list that users have acted on before on the website. The easiest way to measure novelty is to use the average popularity of recommendation results, since less popular items are more likely to be perceived as novel by users. Therefore, if the average popularity of items in the recommendation results is low, then the recommendation results may have relatively high novelty.

6. Surprise

Surprise degree ( serendipity ) is the hottest topic in the field of recommendation system in recent years. If the recommendation result is not similar to the user's historical interest, but it makes the user feel satisfied, then it can be said that the recommendation result has a high degree of surprise, and the recommendation The novelty of is only determined by whether the user has heard of the recommendation.

7. Trust

To measure the trust degree of the recommender system, we can only ask users whether they trust the recommendation results of the recommender system by means of a questionnaire. There are two main ways to improve the trust of a recommender system. First of all, it is necessary to increase the transparency of the recommendation system ( transparency )①, and the main way to increase the transparency of the recommendation system is to provide recommendation explanations. Only by letting users understand the operating mechanism of the recommender system and allowing users to agree with the operating mechanism of the recommender system can the user's trust in the recommender system be improved. The second is to consider the user's social network information, use the user's friend information to make recommendations to the user, and use friends to explain the recommendation. This is because users generally trust their friends, so if the recommended product is purchased by a friend, they will have relatively more trust in the recommendation result.

8. Real-time

In many websites, because items (news, Weibo, etc.) have strong timeliness, it is necessary to recommend items to users when they are still timeliness. The real-time performance of recommender systems includes two aspects. First, the recommendation system needs to update the recommendation list in real time to meet the new behavior changes of users. The second aspect of real-time is that the recommender system needs to be able to recommend items newly added to the system to the user. This mainly tests the ability of the recommender system to handle the cold start of items.

 

9. Robustness

The evaluation of algorithm robustness mainly utilizes simulated attacks. First, given a dataset and an algorithm, the algorithm can be used to generate recommendation lists for users in this dataset. Then, noise data is injected into the dataset using common attack methods, and then the algorithm is used to generate a recommendation list for the user again on the noise-injected dataset. Finally, the robustness of the algorithm is evaluated by comparing the similarity of the recommendation lists before and after the attack. If the recommendation list after the attack does not change significantly compared to before the attack, it means that the algorithm is relatively robust.

Try to use expensive user behaviors when designing a recommender system. For example, if there are user purchasing behaviors and user browsing behaviors, then user purchasing behaviors should mainly be used, because purchases require payment, so the cost of attacking purchasing behaviors is far greater than attacking browsing behaviors.

 Perform attack detection to sanitize data before using it.

Generally speaking, the evaluation dimensions are divided into the following three types.

The user dimension mainly includes the user's demographic information, activity, and whether it is a new user.

Item dimension includes item attribute information, popularity, average score, and whether it is a newly added item.

 The time dimension includes seasons, whether it is weekdays or weekends, whether it is day or night, etc.

If we can include system evaluation indicators in different dimensions in the recommendation system evaluation report, it can help us fully understand the performance of the recommendation system, find the advantages of a seemingly weak algorithm, and discover the shortcomings of a seemingly strong algorithm.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326801644&siteId=291194637