Recall algorithm in recommendation system is roughly sorted out (not complete)

Recall algorithm in recommendation system is roughly sorted out (not complete)

definition

The two key issues in the recommendation strategy are "recall" and "sort".
"Recall match" means triggering as many correct results as possible from the full amount of information and returning the correct results to "Sort"

source

The full amount of goods and users are matched and sorted, and the computing power cannot be supported. It is necessary to recall to obtain a smaller set of product candidates, and then sort the complex model.

index

Recall rate (recall), accuracy rate (precision)

Type of recall

1.基于内容匹配的召回(content-based)
2.基于协同过滤的召回 (collaborative filtering)
	协同过滤分为:
	a. 基于共现关系 neighborhood ( user-based ,item-based)
	b.基于模型 model-based
	模型协同过滤细分为:
		i.传统svd,FM
		ii, 深度网络 DNN (deepmatch)、embedding (w2v、graph embedding)

base algorithm

item-based CF (i2i)

Commodities are expressed as vectors according to user browsing. For example, commodity A is browsed by user a and user b, but not browsed by user c. It can be expressed as
(1, 1, 0 ). Calculate the cosine similarity of two products. cos (A, B) = AB / (|| A || * || B ||)

In practical applications, the user behavior in a session is used to calculate the pair of two products

The resulting offline data is left i, and similar right i, from high to low according to similarity. When used online, right i should be truncated.


Modified algorithm Cosine similarity is too rough, prone to Harry Potter effect, the improved version will reduce the weight of popular users on products
1. wbcos i2i (weighted bin)
2. swing i2i
3. expectaion i2i

Ranki2i
background: The above are based on statistically generated i2i. The actual display is not to be the most similar to the original product. There is also a comprehensive output
process that considers the product ’s ctr and cvr : using the model, offline training and offline prediction, right i Click through rate and rearrange.

1.分别预测ctr,cvr
类似于线上rank,这里采用左i的feature和右i的feature,模型可采用现有rank模型,(gbdt,lr,dnn皆可)
2.一步到位
利用pairwise或者listwise 作为损失函数
将gmv或者点击率转化为预测目标,label为多分类,指标为ndcg.

embedding practice

word2vec,graph embedding

Published 93 original articles · praised 8 · 10,000+ views

Guess you like

Origin blog.csdn.net/zlb872551601/article/details/105459334
Recommended