Recommended sort --- Learning to Rank: from pointwise and pairwise to listwise, the classical model and the advantages and disadvantages

Reprinted: https: //blog.csdn.net/lipengcn/article/details/80373744

Ranking is a fundamental problem in the field of information retrieval, it is also an important component of search engine behind the module.

This paper will combine machine learning technology --learning2rank-- to be a ranking system consolidation, including pointwise, pairwise, listwise three types, they are a classic model to solve the problem, what the defect still exists.

In this paper, reference Tie Rock teacher "Learning to Rank for Information Retrieval" and Li Hang teacher "Learning to rank for information retrieval and natural language processing".

1 Overview

1.1 Ranking

Ranking can be roughly divided into model based on relevance and on the importance of two major categories of sort.

  • Based on the correlation model , usually using the word doc between the query and the co-occurrence characteristics (such as Boolean model), VSM vector space model (such as TFIDF, LSI, etc.), the probability of sequencing method (BM25, LMIR, etc.) and other ways.
  • The importance of model-based , is the importance of using the doc itself, such as PageRank, TrustRank and so on.

Here we focus on relevancy ranking based.

Marked degree of correlation

    The most popular way to achieve the same relatively good time, manual annotation MOS, ie correlation levels.
    Secondly, the manual annotation pairwise preference, i.e., whether a doc doc more relevant to the query relative to the other.
    The most costly way is an artificial mark overall relevancy ranking of docs and query.

Evaluation index

That sort and evaluate real difference between the predicted sort of query and docs.
Most of assessment indicators are defined for each set of query-docs, and then averaged across all groups. Commonly used error rate based metric ranking follows
   

MAP
    First, suppose we have binary judgment for the documents, ie, the label is one for relevant documents and zero for irrelevant documents, docs ordered list π defined in precision of position k
   
    Secondly, let m for query corresponding to the number of docs, m_1 query for tags corresponding to the number of docs 1, there are average precision (AP) as

    the last, the AP is obtained by averaging all query, obtain MAP.
   

NDCG
    First, Discounted cumulative gain (DCG) to consider the relevance judgment in terms of multiple ordered categories , as well as the location information of the discount considerations. Defined docs ordered list π in position k DCG is

    where the function G corresponding rating value doc is generally exponential function, such as G (x) = 2 ^ x -1, a function [eta] ie, the position discount factor, usually [eta] ( j) = 1 / log (j + 1).
    Next, the DCG @ k normalized, structured to 0-1, Z_k DCG @ k represents the maximum possible, thereby NDCG

Can be found, these assessments indicators have two characteristics:

    based query, that is, no matter how bad a query corresponding sorting docs, it will not seriously affect the overall evaluation process, because each set of query-docs are identical to the average index of contribution .
    Based on position, i.e., explicit use of the location information of the sorted list, the side effects of this feature is that the index is not differentiable discrete.

On the one hand, these indicators discrete non-differentiable, so can not be applied to certain learning algorithm model; on the other hand, the more authoritative evaluation index, is often used to assess ranking model trained on all kinds of ways. Therefore, even if some of the proposed model loss function constructed novel way, they have inspired these indicators, in line with the above two properties can. These details will slowly come to realize in the back.

1.2 Learning to Rank

Learning2Rank upcoming ML technology to ranking issues, training ranking model. Here are usually applied discriminant supervised ML algorithms. Classic L2R frame as follows

  • X is a feature vector reflecting between a query and a corresponding correlation doc, conventional ranking relevance model generally previously mentioned can be used as a dimension.
  • L2R supervised machine learning methods used mainly discriminant class.

 

Guess you like

Origin www.cnblogs.com/Lee-yl/p/11200535.html