Learning to rank the basic algorithm

Sort search related methods, including

  1. Learning to rank the basic method
  2. Learning to rank index Introduction
  3. LambdaMART model principle
  4. FTRL model principle

Learning to rank

Sort learning is recommended, search and core methods of advertising. Good or bad to a large extent the result of the sort of impact the user experience, advertising revenue and so on.
Sorting can be understood as learning method machine learning users to sort, where first recommended a Microsoft Research Asia Tie Rock teacher about LTR book, Learning to Rank for Information Retrieval, various methods for ordering the book did a very good learning elaboration and conclusion. I have here is a super-streamlined version.

There is a sort of learning process is supervised machine learning, for each given query - documents for extracting features by logging or mining method manual annotation of obtaining real data annotation. Then sorting the model, so that the input and the actual data can be similar.
Common sort of learning divided into three types: PointWise, PairWise and ListWise.

PointWise

Single document processing target method is a single document, the document is converted to feature vectors, the machine learning system to learn from the training data according to a document classification or regression function scoring, scoring result is the search result


PointWise method is well understood that the use of traditional machine learning methods to learn the relevance of the document at a given query, such as CTR can PointWise method of using learning, but sometimes sort of order is very important, and PointWise way to learn the relevance of global, do not punish the merits of the order.

PairWise

For search system, the system receives a user query that returns a list of related documents, so the key question is to determine the sequence of the relationship between documents. The method of calculating a single document completely from the perspective of a single document classification score, the order is not considered the relationship between documents. Documentation of the sort method to sort the problem into a pair of multiple problems, the order to compare different article.


But the document also there is a problem of method:

  1. Documentation of the method takes into account the relative order of the two documents, but did not consider the position of the document appears in the search list, ahead of search results is more important documents, if the front of the document to determine the error occurs, the price was significantly higher than at the back of the document.

  2. At the same time a different query, the relevant number of documents vary widely, so after a document to convert, to have some queries on hundreds of corresponding documents, while others are only a dozen queries corresponding to the document, which is evaluation of machine learning system cause difficulties

Common PairWise achieve:

  1. SVM Rank
  2. RankNet(2007)
  3. Rank Boost (2003)

ListWise:

Single-document method training set in each document as a training example, a document on the two methods as a training document for example, the document list method and the above two methods are different, ListWise any method with the results of a query's search direct consider the overall sequence, optimized for Ranking index. For example, commonly used MAP, NDCG. ListWise commonly used methods are:

  1. LambdaRank
  2. AdaRank
  3. SoftRank
  4. LambdaMART

Learning to rank index Introduction

  • MAP (Mean Average Precision):
    Suppose there are two subjects, subject has four pages 1, 2 and 5 relating to related pages. For retrieval of a system relating to a four pages, which are rank 1, 2, 4, 7; 2 relating to retrieve the three pages which are 1,3,5 rank. For theme 1, the average accuracy rate (1/2 + 1/2 + 3/4 + 4/7) /4=0.83. 2 relating to the average accuracy rate (1/1 + 2/3 + 3/5 + 0 + 0) /5=0.45. The MAP = (0.83 + 0.45) /2=0.64 .

  • NDCG(Normalized Discounted Cumulative Gain):


NDCG the correlation levels is divided into r, if r = 5, each packet level setting 2 ^ 4-1 ^ 5-1, 5-2, etc.

There are then added to a query ABC, to return a list as shown below, assuming the user to select and sort the results irrelevant, the cumulative gain value shown on the right column:



Considering the greater the probability of clicking forward position, then the need to add a position under the attenuation factor

log2/log(1+j),求和就可以得到DCG的值,最后为了使得不同不搜索结果可以比较,用DCG/MaxDCG就可以得到NDCG的值了。
MaxDCG就是当前情况下最佳排序的DCG值。如图所示MaxDCG就是1、3、4、5、2的排序情况下的DCG的值(rank 2的gain较低,应该排到后面)



NDCG值

  • MRR(Mean Reciprocal Rank)
    给定查询q,q在相关文档的位置是r,那么MRR(q)就是1/R

LambdaMART算法:

LambdaMART是Learning to rank其中的一个算法,在Yahoo! Learning to Rank Challenge比赛中夺冠队伍用的就是这个模型。

LambdaMART模型从名字上可以拆分成Lambda和MART两部分,训练模型采用的是MART也就是GBDT,lambda是MART求解使用的梯度,其物理含义是一个待排序文档下一次迭代应该排序的方向。

但Lambda最初并不是诞生于LambdaMART,而是在LambdaRank模型中被提出,而LambdaRank模型又是在RankNet模型的基础上改进而来。所以了解LambdaRank需要从RankNet开始说起。

论文:
From RankNet to LambdaRank to LambdaMART: AnOverview

RankNet

RankNet是一个pairwise模型,上文介绍在pairwise模型中,将排序问题转化为多个pair的排序问题,比较文档di排在文档dj之前的概率。如下图所示



最终的输出的sigmoid函数,RankNet采用神经网络模型优化损失函数,故称为RankNet。


可是这样有什么问题呢?排序指标如NDCG、MAP和MRR都不是平滑的函数,RankNet的方法采用优化损失函数来间接优化排序指标。

LambdaRank

如图所示,蓝色表示相关的文档,灰色表示不相关的文档。RankNet以pairwise计算cost左边为13,右图将第一个文档下调3个位置,将第二个文档下调5个位置,cost就降为11。如此以来,虽然RankNet的损失函数得到优化,但是NDCG和ERR等指标却下降了。
RankNet优化的方向是黑色箭头,而我们想要的其实是红色的箭头。LambdaRank就是基于这个,其中lambda表示红色箭头。

LambdaRank不是通过显示定义损失函数再求梯度的方式对排序问题进行求解,而是分析排序问题需要的梯度的物理意义,直接定义梯度,Lambda梯度由两部分相乘得到:(1)RankNet中交叉熵概率损失函数的梯度;(2)交换Ui,Uj位置后IR评价指标Z的差值。具体如下:


lambdaMART

我们知道GBDT算法每一次迭代中, 需要学习上一次结果和真实结果的残差。在lambdaMART中,每次迭代用的并不是残差,lambda在这里充当替代残差的计算方法。

  • LambdaMART算法流程


  • GBDT算法流程


对比lambdaMART和GBDT算法流程,主要框架是相同的,只不过LambdaMART模型用lambda梯度代替了GBDT的残差。

FTRL算法(Follow the regularized Leader Proximal)

论文: Ad click prediction: a view from the trenches

点击率预估问题(CTR)是搜索、广告和推荐中一个非常重要的模块。在CTR计算的过程中,常常采用LR模型。

FTRL属于在线算法,和SGD等常用的在线优化方法对比,可以产生较好的稀疏性,非常适合ID特征较多,维度较高的特征。

google的论文中已经给出了很详细的工程化实现的说明,该方法也已经广泛的应用。

  • 参数优化


第一项:保证参数不偏移历史参数
第二项:保证w不会变化太大
第三项:代表L1正则,获得稀疏解

  • 算法流程:


 

 

 

 

 

 

 

 

感谢答主整理:https://zhuanlan.zhihu.com/p/26539920

xgboost L2R:https://blog.csdn.net/seasongirl/article/details/100178083

 

 

机器学习算法中 GBDT 和 XGBOOST 的区别有哪些?

https://www.zhihu.com/question/41354392?sort=created

Guess you like

Origin www.cnblogs.com/Allen-rg/p/11529591.html