On 18 ------ recommendation algorithm

1, the purpose of the recommendation system

(1) to help users find the goods (News / Music / ......) you want to explore the long tail

(2) reduce the information overload

CTR (3) to improve the site / conversion rate

(4) a better understanding of users, to provide users with customized services

2, recommendation algorithm Overview

  Recommendation algorithm is very old, at the time of the rise of machine learning have not had the needs and applications. In summary, it can be divided into the following five:

  1) recommendations based : some knowledge of this type are generally dependent on natural language processing NLP by TF-IDF text mining feature vectors to get the user's preferences, and then make recommendations. Such a recommendation algorithm to find a unique niche preferences of the user, but also a better explanatory. Because this type of NLP needed basis, this article is not to talk, specifically about NLP discuss in a later time.

  2) collaborative filtering recommendation : later in this article to be devoted to content. Coordination filtering recommendation algorithm is the most mainstream of the species , variety, has been widely used in many industries. The advantage is that does not require much knowledge in specific areas, you can get a better recommendation results by machine learning algorithms based on statistics. The biggest advantage is easily achieved on the project, it can be easily applied to products. Currently the vast majority of the practical application of the recommendation algorithm is collaborative filtering algorithms.

  3) hybrid recommendation : This is similar to our machine learning ensemble learning, Bo long before the public, by combining multiple recommendation algorithm to get a better recommendation algorithms, Three Stooges play a top role of Zhuge Liang. For example, by creating multiple models recommendation algorithm, and finally voting method to determine the final recommendation result. Hybrid recommendation theoretically not worse than any single recommendation algorithm, but using hybrid recommendation algorithm complexity would be increased, there is use in practical applications, but there is no single coordinating filtering algorithm, such as logistic regression and the like wide binary classification recommendation algorithm.

  Real-world applications, in fact, there is little direct use of an algorithm to do the recommended system. In some large sites such as Netflix, it is the integration of dozens of recommendation system algorithm . We can addition weight to the results to the combined result of different algorithms, or use different algorithms in different computing session mixed to achieve the purpose of a more fitting their business.

  4) rule-based recommendation : This type of algorithm based on the most common such as user clicks, the user browsing the most recommended method belonging to the mass, and in the current era of big data is not mainstream.

  5) recommendation based on demographic information : This category is the simplest recommendation algorithms, and it simply found that the degree of correlation based on the user's basic information system users, and then make a recommendation, now rarely used in large systems.

3, content-based recommendation algorithm

  For a given user, and recommended his favorite project before there is similarity in the content other projects. This is recommended only needs to be two types of information: Information Project features like description and user's past.

  For chestnut, and now the system has a user and a news item. By analyzing user behavior as well as text content of the news, we extract several keywords. These keywords as attributes, the user (past preferences) and news (new content) into a vector.

  After the re-calculation of vector distance, we can draw the similarity of the user and news, and will calculate the biggest similarity (eg cosine calculation) content recommendation.

  This method is very simple, if while watching the Premier League is a love of football fans recommended news, in the presence of key sports, football, Premier League at the same time, it is clear before the match as the two words are not exactly come straight Premier League matches, the system how to reflect this "importance" keyword it? Then we can introduce the concept of the right word. By calculating (such as typical TF-IDF algorithm) in a large corpus, we can calculate the weight of each keyword news, the introduction of the heavy weights in calculating the impact of the similarity, we can achieve more accurate results. sim (user, item) = text similarity (user, item) * the right word, the word will be important to increase the weight.

  However, regular contact sports news data aspects of the students will be asked the question: If the user is interested in football, and the news is the key word Bundesliga, Premier League, according to the above method obviously can not match the text to link them together. Here, we can refer to the topic clusters : word2vec use tools like keyword clustering can be text, then the text will be vectorized according to topic, then make the text content similarity calculation with the user according to topic.

  In summary, it can be based on content recommendation algorithm solves the problem of cold start , and will not be confined to the limitations of heat, because it is directly based on matching content, regardless of the history. However, it also has some drawbacks, such as excessive specialization (over-specialisation) problems. This approach would have been recommended to the item closely related to user content, but lost the diversity of the recommended content.

4, cold start problem

  Cold start problems are mainly divided into three categories:

  • User cold start, that is, how to make personalized recommendations for new users
  • Items cold start, that is, how will recommend new items to the user may be interested in it
  • Cold start, that is, how in the development of a new web site (no user, no user behavior, only some of the items of information) design personalized recommendation system, so that when the site posted just let the user experience the personalized recommendations

  solution:

  • Recommended popular list, wait until the user data collected to a certain time, then switch to a personalized recommendation;
  • With user registration information;
  • Some items for the user when logging feedback, collect user interest information on these items, and then recommend that these items similar items to the user;
  • Using expert annotation;
  • Use of content information items, items cold start;
  • Elsewhere using the user data has been precipitated cold start;
  • Using the user's mobile phone and other interest preferences cold start, such as software installation, etc.

5, contextual information, social networking

  Context information recommendation system is the time recommended by users access the system, place, mood and so on. Introduction temporal context and location context , let recommendation system can accurately predict user interest in a particular time and a particular place. Real-time and diversity recommendation system.

  Social networks: e-mail, registration information, location data, forums, discussion groups, social networking sites.

  Social recommendation website of the reason why a lot of attention, mainly due to the following advantages:

  • Friends recommended recommended can increase trust , friends tend to be the most trusted users, users often do not necessarily trust the computer's intelligent, but would recommend a good friend of trust
  • Social networks can solve the problem of cold start , when a new user account to log Sina microblogging site, you can obtain the user's buddy list from social networking sites, and then recommend a friend like the article on the site to the user. Thereby providing a higher quality give users in the absence of user behavior records recommendation result, a partial solution to the problem of cold start recommendation system

  Cloud music using a three dimensional recommend music to people: friend, artificial recommendation, intelligent recommendation

6, assessment

  After the completion of recommendation algorithm, how to evaluate the effect of this algorithm? CTR (click-through rate), CVR (conversion rate), residence time, are very intuitive data. After completion of the algorithm, we can calculate the RMSE (root mean square error) algorithm through the line or lines to be ABTest contrast. 

7, improved

  • 1, open up the company's major business platform, by acquiring user data to other platforms , completely solve the problem of cold start;
  • 2, on different devices synchronize user data , including QQID, device number, mobile phone number and the like;
  • 3, enrich the user's demographic attributes including age, occupation, geographical and other;
  • 4, better user interest in the state, to facilitate user generated tags and matching content.

  In addition, the company's strengths - social platform is also a good place to use. Using the user's social network, you can easily through the user's friends, members of interest groups and other similar users more quickly find the content and the user may be interested in, improve the recommendation accuracy.

 

References: https://www.cnblogs.com/rongyux/articles/5396844.html

 

Published 121 original articles · won praise 8 · views 30000 +

Guess you like

Origin blog.csdn.net/bylfsj/article/details/104831528