Cold Start in Recommender Systems

1. What is a cold start?

  The recommendation system needs to predict the user's future behavior and interest based on the user's historical behavior and interest, so a large amount of user behavior data becomes an important part and prerequisite of the recommendation system. The acquisition of these data may not be a worrying issue for some popular websites or apps, but for some newly launched websites and other platforms at the beginning stage, how to design a personalized recommendation system without a large amount of user data and Making users satisfied with the recommendation results and willing to use the recommendation system is the problem of cold start .

2. Common cold start types

In the recommendation system, new users and new products are constantly generated. Common cold start types are:

  • Item cold start : mainly solves how to recommend new items to users who may be interested in it. It can also be regarded as an item cold start based on the user's cold start.
  • User cold start : mainly solves the problem of how to make personalized recommendations for new users. For new users, we do not have his behavior data, so we cannot predict his interests based on his historical behavior and make personalized recommendations for him.
  • System cold start : It mainly solves how to design a personalized recommendation system on a newly developed website.

3. Cold start solution

3.1 Provide non-personalized recommendations

  The simplest solution is to provide popular rankings , which can recommend popular rankings to users, and then switch to personalized recommendations when user data is collected to a certain extent.

For the theoretical test of popular rankings to solve the recommendation problem, you can refer to the article Performance of recommender algorithms on top-n recommendation tasks.
And Netflix's research also shows that new users are indeed more inclined to popular rankings during the cold start stage. Users will need more long-tail recommendations.

3.2 Using user registration information

There are three main types of user registration information:

1) Demographic information, including age, gender, occupation, ethnicity, education and place of residence

2) Description of user interests, some websites allow users to describe interests in words

3) User off-site behaviors imported from other websites, such as users logging in with a social networking site account, can import part of the user's behavior data and social network data on the social networking site with the authorization of the user

The granularity of this kind of personalization is very coarse. Assuming gender is used as a granularity for recommendation, all women who have just registered will see the same results. However, compared with the method of not distinguishing between men and women, the accuracy of this recommendation has been greatly improved.

The recommendation process is basically as follows:

  • Get the user's registration information
  • Classify users based on their registration information
  • Recommend to the user the items that the user likes in the category he belongs to

3.3 Select the right item to activate the user's interest

  The user gives feedback on some items when logging in, collects the user's interest information on these items, and then recommends items similar to these items to the user.
In general, items that can be used to trigger user interest need to have the following characteristics:

  • More popular, if you want users to give feedback on the item, the premise is that the user must know what it is;
  • Be representative and differentiated, and the items that trigger the user's interest cannot be popular or suitable for all ages, because such items do not differentiate the user's interest;
  • The collection of start-up items needs to be diverse. In cold start, we do not know the interests of users, and there are many possibilities for user interests. In order to match diverse interests, we need to provide a collection of start-up items with high coverage. These items Can cover almost all mainstream user interests.

3.4 Using content information of items

  It is used to solve the cold start problem of items, that is, how to recommend newly added items to users who are interested in it. The item cold start problem is very important in time-sensitive websites such as news websites, because these websites have new items added all the time, and each item must be displayed to the user at the first time, otherwise after a period of time, The value of the item is greatly reduced.
Two recommendation algorithms for collaborative filtering—userCF algorithm and itemCF algorithm respectively solve the problem of item cold start.

  • The userCF algorithm
    is aimed at websites where the recommendation list is not the only list that displays content to users (most websites are like this).
    When new items are added, users will always see them through certain channels, so when a user gives feedback , the item may appear in the recommendation list of users with similar historical interests, so that more people give feedback on the item, resulting in the item appearing in the recommendation list of more people. Therefore, the item can continue to spread and gradually be displayed in the recommendation list of users who are interested in it.

The userCF algorithm for websites where the recommendation list is the main way for users to obtain information (such as Douban Internet Radio)
needs to solve the problem of the first driving force, that is, where the first user finds new items. The easiest way is to randomly show new items to the user, but it's too impersonal. Therefore, it can be considered to use the content information of the item to release the new item to users who have liked other items with similar content.

  • itemCF algorithm
    For the itemCF algorithm, item cold start is a serious problem. Because the basis of the algorithm is to calculate the similarity between items through the user's behavior on the item, when the new item has not been displayed to the user, the user cannot generate behavior. For this reason, only the content information of the item can be used to calculate the degree of relevance of the item. The basic idea is to convert items into keyword vectors, and obtain the degree of relevance of items by calculating the similarity between vectors (for example, calculating cosine similarity).

3.5 Using expert annotation

  When many systems are established, there is neither user behavior data nor sufficient item content information to calculate item similarity. In this case, many systems use experts for annotation.
Representative systems: Personalized Internet radio station Pandora, movie recommendation website Jinni
Taking Pandora radio station as an example, Pandora hired a group of musicians to mark tens of thousands of singers' songs in various dimensions, and finally selected more than 400 features. Each song can be identified as a 400-dimensional vector, and then the similarity of the song is calculated by a common vector similarity algorithm.

3.6 Use the data that users have deposited in other places for cold start

Take QQ Music as an example:
QQ Music's Guess You Like radio station wants to guess the taste preferences of users who use QQ Music for the first time. One of the advantages is that they can use data from other Tencent platforms, such as who they follow in QZone, and who they follow in Tencent. Who is followed on Weibo? Go a step further. For example, if you just watched an anime on Tencent Video, if QQ Music recommends songs from this anime, users will feel very humane. This is leveraging data that users already have on other platforms.

Another example is Toutiao today:
it obtains the user's attention list after the user logs in through social networking sites such as Sina Weibo, and crawls the feed (forwarding/commenting, etc.) that the user has recently participated in the interaction, and performs semantic analysis on it to obtain the user. Preferences.

Therefore, the premise of this method is to guide users to log in through social network accounts, so that on the one hand, it can reduce registration costs and improve conversion rates; on the other hand, it can obtain users’ social network information and solve the cold start problem.

3.7 Use the user's mobile phone and other interest preferences for cold start

  Android phones are relatively open, so when you install your own app, you can take a look at what other apps are installed on the phone. For example, if a user installs apps such as Meilishuo, Mogujie, Hot Mombang, and Dayima, it can be determined that the user is a woman, and furthermore, it can be determined whether it is a pregnant woman or a girl.
At present, in addition to the app store, the function of reading the applications installed by users is also being implemented for some news and video applications, which is very helpful for solving the cold start problem.

4. Summary

  • The cold start problem has always existed, so it is very important to solve the cold start problem.
  • Cold start is divided into: user cold start, item cold start, and system cold start.
  • Solution to the cold start problem: recommendation of popular items (non-personalized), use of user registration information, selection of appropriate items, use of item content information, use of user preferences such as collection

This article is only used as a personal learning record, not for commercial purposes, thank you for your understanding.

Reference: https://zhuanlan.zhihu.com/p/345213021

Guess you like

Origin blog.csdn.net/weixin_44852067/article/details/130114196
Recommended