Recommended system big data project

The basic idea of recommendation system

• Use user and articles feature information, the user is recommended to those with a user's favorite features.

• Use user liked items to the user and recommend that he liked items of similar items.

• the use of other similar users and users, to users who recommend others with similar interests and their

Household favorite items.

• know what you want, precise push

　　- the use of user feature information and articles, giving users recommend those with a user's favorite features.

• feather flock together

　　- use user liked items to the user and recommend that he liked items of similar items.

• people in groups

　　- the use and users of other similar users, who recommended to the user and other users hi their similar interests

Huan items.

Data analysis recommendation system

• To recommend the article or content metadata, such as keywords, category labels, and other genes are described;

• Basic information system users, such as gender, age, interests, tags, etc.

• user behavior data, can be converted into preference items or information, depending on the application itself,

　　May include user rating of items, record the user to view objects, the user's purchase records. This

　　These user preference information can be divided into two categories:

　　- explicit user feedback: These are natural users browse sites on the Web site or use other than explicitly provided

　　　　Feedback information, such as user score of articles or comments on the article.

　　- implicit user feedback: the user is using such site is data generated implicitly reflects the user thereof

　　　　Product preferences, such as the user to buy a certain item, the user to view information about an item, and so on.

Recommended classification system

• According to the real-time classification

- Offline recommended

- real-time recommendations

• classified according to the principles recommended

- Based on the similarity of recommendation

- Knowledge-based recommendation

- model-based recommendation

• classified according to whether personalized recommendations

- recommendations based on statistics

- Personalized Recommendation

• classification based on the data source

- Based on demographic recommended

- content-based recommendation

- based collaborative filtering recommendation

Introduction recommendation algorithm

• Recommended based on demographics

• content-based recommendation

• collaborative filtering based recommendation

• hybrid recommendation

Recommendation algorithm based on demographics

Content-based recommendation algorithm

Based on collaborative filtering recommendation algorithm

• collaborative filtering (Collaborative Filtering, CF)

• based on collaborative filtering neighbors

- based on the user (User-CF)

- Based Item (Item-CF)

• model-based collaborative filtering

- singular value decomposition (SVD)

- Latent Semantic Analysis (LSA)

- support vector machine (SVM)

Collaborative filtering (CF) Recommended Method

• based on the content (Content based, CB) feature is user content items have been evaluated mainly utilized,

The CF method can also make use of content articles rated by other users

• CF can solve some of the limitations of CB - when the article content does not entirely or difficult to obtain, still can

Other users give feedback -CF recommendation based on evaluation of the quality items between users to avoid the CB only

Dependent content may cause interference to judge the quality of goods is not recommended -CF content restrictions, as long as other

Users are given a similar interest in the different items, CF can recommend a very different content to users

Items (but there is some intrinsic link)

Divided into two categories: model-based and neighborhood

Collaborative filtering based on the user

Collaborative filtering items based on

Hybrid recommendation

• The actual site of recommendation systems are often not simply using only a certain kind of recommendation mechanisms and strategies, often multiple

　　A method of mixing together, so as to achieve better recommendation results. Popular combination methods are:

• weighted mixture

　　– 用线性公式（linear formula）将几种不同的推荐按照一定权重组合起来，具体权重的值需要

　　　　在测试数据集上反复实验，从而达到最好的推荐效果

• 切换混合

　　– 切换的混合方式，就是允许在不同的情况（数据量，系统运行状况，用户和物品的数目等）下，

　　　　选择最为合适的推荐机制计算推荐

• 分区混合

　　– 采用多种推荐机制，并将不同的推荐结果分不同的区显示给用户

• 分层混合

　　– 采用多种推荐机制，并将一个推荐机制的结果作为另一个的输入，从而综合各个推荐机制的优

　　　　缺点，得到更加准确的推荐

推荐系统评测

• 让用户更快更好的获取到自己

　　需要的内容

• 让内容更快更好的推送到喜欢

　　它的用户手中

• 让网站（平台）更有效的保留

　　用户资源

推荐系统实验方法

• 离线实验

　　– 通过体制系统获得用户行为数据，并按照一定格式生成一个标准的数据集

　　– 将数据集按照一定的规则分成训练集和测试集

　　– 在训练集上训练用户兴趣模型，在测试集上进行预测

　　– 通过事先定义的离线指标评测算法在测试集上的预测结果

• 用户调查

　　– 用户调查需要有一些真实用户，让他们在需要测试的推荐系统上完成一些任务；我们需要记录

　　　　他们的行为，并让他们回答一些问题；最后进行分析

• 在线实验

　　– AB测试

推荐系统评测指标

• 预测准确度

• 用户满意度

• 覆盖率

• 多样性

• 惊喜度

• 信任度

• 实时性

• 健壮性

• 商业目标

推荐准确度评测

• 评分预测

　　– 很多网站都有让用户给物品打分的功能，如果知道用户对物品的历史评分，就可

　　　　以从中学习一个兴趣模型，从而预测用户对新物品的评分

　　– 评分预测的准确度一般用均方根误差（RMSE）或平均绝对误差（MAE）计算

• Top-N推荐

　　– 网站提供推荐服务时，一般是给用户一个个性化的推荐列表，这种推荐叫做

　　　　Top-N推荐

　　– Top-N推荐的预测准确率一般用精确率（precision）和召回率（recall）来度量

准确率、精确率和召回率

•假如某个班级有男生80人，女生20人，共计100人，目标是找出所有女生。

　　现在某人挑选出50个人，其中20人是女生，另外还错误的把30个男生也当作女生

　　挑选出来了。那么怎样评估他的工作？

•将挑选结果用矩阵示意表来表示：定义TP, FN, FP, TN四种分类情况

•准确率(accuracy)

　　——正确分类的item数与总数之比

　　　　　　　　　　A =(20+50)/100 = 70%

•精确率(precision)

　　——所有被检索到的item中，"应该被检索到"的item占的比例

　　　　　　　　　　P = 20/ (20+30) = 40%

•召回率(recall)

　　——所有检索到的item占所有"应该检索到的item"的比例

Recommended system big data project

Guess you like