Collection of ranking indicators (Series 1)

一 、MAP(Mean Average Precision):

The average accuracy for a single topic is the average of the accuracy after each relevant document has been retrieved. The mean accuracy of the main set (MAP) is the average of the mean accuracy of each subject. MAP is a single-valued metric that reflects the system's performance on all relevant documents. The higher the relevant documents retrieved by the system (the higher the rank), the higher the MAP is likely to be. If the system does not return relevant documents, the accuracy rate defaults to 0.

There are only 1 and 0, 1 represents relevant and 0 represents irrelevant.

For example: Suppose there are two topics, topic 1 has 4 related web pages, and topic 2 has 5 related web pages. A system retrieves 4 relevant web pages for topic 1, and their ranks are 1, 2, 4, and 7 respectively; for topic 2, it retrieves 3 relevant web pages, and their ranks are 1, 3, and 5 respectively. For topic 1, the average accuracy is (1/1+2/2+3/4+4/7)/4=0.83. For topic 2, the average accuracy is (1/1+2/3+3/5+0+0)/5=0.45. Then MAP= (0.83+0.45)/2=0.64.

二、MRR(Mean Reciprocal Rank):

It takes the reciprocal of the ranking of the standard answers in the results given by the evaluated system as its accuracy, and then averages all the questions. Relatively simple, for example: there are three queries as shown below:
Insert image description here
(The one in bold is the most matching item in the returned result)

The MRR value of this system can be calculated as: (1/3 + 1/2 + 1)/3 = 11/18=0.61.

三、NDCG(Normalized Discounted Cumulative Gain):

NDCG is more complex than MAP and MRR, but it is also one of the best evaluations for evaluating the quality of information retrieval. Let me first give an example to illustrate how a NDCG is calculated, because there are actually differences in the calculation of NDCG.

I will first introduce CG and DCG. On this basis, the definition of NDCG will be easier to understand.

3.1 DG and DCG

Maybe everyone is more familiar with MAP, which considers the ordering of 0 and 1. The NDCG takes into account the ranking of scores.

When it comes to NDCG, we need to start with CG.

CG (cumulative gain) can be used to evaluate personalized recommendation systems based on scoring. Of course, it can be applied to any sorting scenario. Here we only take recommendation as an example. Suppose we recommend an item, and the CG k of this recommended list is CG_kCGkThe calculation formula is as follows:
CG k = ∑ i = 1 k rel i . CG_k=\sum_{i=1}^k \text{rel}_i.CGk=i=1kreli.
rel i \text{rel}_i relirepresents the kthRelevance or rating of k items. Suppose we recommendkkk movies,reli rel_ireliCan be the user's response to the iii movie ratings.

For example, Douban recommends five movies to users.

M 1 , M 2 , M 3 , M 4 , M 5 , M_1,M_2,M_3,M_4,M_5, M1,M2,M3,M4,M5

The user's ratings for these five movies are

5 , 3 , 2 , 1 , 2 5, 3, 2, 1, 2 5,3,2,1,2

Then the CG of this recommendation list is equal to
CG 5 = 5 + 3 + 2 + 1 + 2 = 13. CG_5=5+3+2+1+2=13.CG5=5+3+2+1+2=1 3. CG
does not consider the order of recommendations. After this basis, we introduce the consideration of the order of items, and we have DCG (discounted CG), discounted cumulative gain. The formula is as follows:

D C G k = ∑ i = 1 k 2 rel i − 1 log ⁡ 2 ( i + 1 ) . DCG_k=\sum_{i=1}^k \frac{2^{\text{rel}_i}-1}{\log_2(i+1)}. D C Gk=i=1klog2(i+1)2reli1.For
example, Douban recommends five movies to users.

M 1 , M 2 , M 3 , M 4 , M 5 , M_1,M_2,M_3,M_4,M_5, M1,M2,M3,M4,M5

The user's ratings for these five movies are

5 , 3 , 2 , 1 , 2 5, 3, 2, 1, 2 5,3,2,1,2

Then the DCG of this recommendation list is equal to
DCG 5 = 2 5 − 1 log ⁡ 2 2 + 2 3 − 1 log ⁡ 2 3 + 2 2 − 1 log ⁡ 2 4 + 2 1 − 1 log ⁡ 2 5 + 2 2 − 1 log ⁡ 2 6 = 31 + 4.4 + 1.5 + 0.4 + 1.2 = 38.5 DCG_5=\frac{2^5-1}{\log_2 2}+\frac{2^3-1}{\log_2 3}+\frac {2^2-1}{\log_2 4}+\frac{2^1-1}{\log_2 5}+\frac{2^2-1}{\log_2 6}=31+4.4+1.5+0.4 +1.2=38.5D C G5=log22251+log23231+log24221+log25211+log26221=31+4.4+1.5+0.4+1.2=38.5

3.2.NDCG

In the process of evaluating the effect of the strategy, DCG did not take into account the recommendation list and the number of truly effective results in each search. In other words, because different search models give more or less results (the size of P is different), there will still be This makes it impossible to compare the two models. In order to avoid this situation, we further optimized this indicator and became NDCG (normalize DCG). As the name suggests, it normalizes the effect standard of a strategy to facilitate the comparison of the effects of different strategies. The formula is as follows:

N D C G k = D C G k I D C G k NDCG_k=\frac{DCG_k}{IDCG_k} NDCGk=IDCGkD C Gk
Among them, IDCG refers to ideal DCG, which is the DCG under perfect results.

Continuing the above example, if there are a total of 7 related movies

M 1 , M 2 , M 3 , M 4 , M 5 , M 6 , M 7 M_1,M_2,M_3,M_4,M_5,M_6,M_7 M1,M2,M3,M4,M5,M6,M7
The user's ratings for these seven movies are

5 , 3 , 2 , 1 , 2 , 4 , 0 5, 3, 2, 1, 2 , 4, 0 5,3,2,1,2,4,0

Sort these 7 movies by rating

5 , 4 , 3 , 2 , 2 , 1 , 0 5, 4, 3, 2, 2, 1, 0 5,4,3,2,2,1,0

The perfect DCG in this case is
IDCG 5 = 2 5 − 1 log ⁡ 2 2 + 2 4 − 1 log ⁡ 2 3 + 2 3 − 1 log ⁡ 2 4 + 2 2 − 1 log ⁡ 2 5 + 2 2 − 1 log ⁡ 2 6 = 31 + 9.5 + 3.5 + 1.3 + 1.2 = 46.5 IDCG_5=\frac{2^5-1}{\log_2 2}+\frac{2^4-1}{\log_2 3}+\frac {2^3-1}{\log_2 4}+\frac{2^2-1}{\log_2 5}+\frac{2^2-1}{\log_2 6}=31+9.5+3.5+1.3 +1.2=46.5IDCG5=log22251+log23241+log24231+log25221+log26221=31+9.5+3.5+1.3+1.2=4 6 . 5
so

N D C G 5 = D C G 5 I D C G 5 = 38.5 46.5 = 0.827 NDCG_5 = \frac{DCG_5}{IDCG_5}=\frac{38.5}{46.5}=0.827 NDCG5=IDCG5D C G5=46.538.5=0 . 8 2 7
NDCG is a number from 0 to 1. The closer it is to 1, the more accurate the recommendation is.

3.3 Explanation of differences in NDCG calculations

We use NDCG@n to represent the NDCG value for sorting each item.

NDCG can be broken down into four parts, namely N (Normalization) standardization, D (Discounted) reduction, C (Cumulative) accumulation, and G (Gain) gain. The four parts represent NDCG by the following formula.

Which represents a query, indicates the calculation of the NDCG of this query using the previous answer returned, and indicates the number of answers.

  • G can be understood as a returned answer’s bonus points for the quality of this query. The size of G has nothing to do with it, it only depends on the quality of the answer.
  • D can be understood as an appropriate deduction for a plus point. Because the answer that is earlier should get more points, and the answer that is further back should get less points. The bonus point G has nothing to do with the position of the answer, so the bonus point size needs to be controlled by D. So D is a quantity that increases as the answer position increases.
  • C is to accumulate the G/D of 1 to 1 to 1 position to obtain the quality score of this query.
  • N is the normalization of the score, which can be understood as N is the score under ideal circumstances, that is, the highest score that can be obtained.
    Calculate the difference in NDCG

The NDCG indicator formula is more complex than the MAR and MRR index formulas, so there is a greater possibility of differences in calculation methods. Except that there is no dispute that C is cumulative, there may be differences in the calculations of N, D, and G.

  • The difference in G is relatively large. Some directly take the correlation score rel as the value of G, and some take 2 rel − 1 2^{rel-1}2r e l 1 as the value of G, of course there are other expressions. The same thing is that the correlation scores are rel = {0,1,2…}. The above example uses2 rel − 1 2^{rel-1}2r e l 1 as the value of G.
  • The same thing about D is that it takes the value in the form of log(i). Obviously when i=1, D=0 and cannot be used as the denominator. So there are two different ways to find out. The first one, when i=1, D takes 1, and the rest takes log(i). The second type D=log(1+i). The above example uses the second type of D=log(1+i).
  • Two calculation methods are also found for N. The same thing is that they all use the same DCG method for calculation, but the difference lies in which values ​​are used for calculation.
    • The first method is to calculate the DCG as the value of N by taking the optimal sorting of the first n of the currently returned results. For example, the correlation degree of a group of NDCG@5 is X={1,0,2,2,1}, change it to X={2,2,1,1,0} and calculate the value of DCG as N. That is to say, the value of set X must appear in the answer. But assuming that the correlation scores of the first n items returned are all 0 and N also becomes 0, then the answer will be wrong.
    • The second method is to form a set X of the best n answers in the entire search space, arrange them from high to low, and then calculate DCG as the value of N. The value of set X is not required to appear in the answer returned by the system. The above example uses the second type.

3.4 Further understanding of NDCG

For search engines, the essence is that the user searches for a query, and the engine returns a list of results. So how to measure the quality of this list of results? What I can think of is:
We want to put the most relevant results at the top of the ranking, because most users read from top to bottom, so putting the most relevant results at the front can minimize the user's reading time;
We hope that the results of the entire list are as relevant to the query as possible.
The satisfaction of the first condition is primary, and the addition of the second condition is to ensure the overall result quality, and both conditions are reflected in NDCG. First, to calculate NDCG, you need to calculate Gain, which is the gain for each The definition of the quality of the results. NDCG adds up all the results and finally adds them up to ensure that the higher the overall quality, the greater the NDCG value of the list. At the same time, the design of Discounted makes the higher the weight of the results, which ensures that the first and more relevant results will have a larger NDCG value. From these two points, using NDCG as the optimization target ensures that search engines will rank higher quality results higher when the overall quality of the returned results is good.

4. References:

Guess you like

Origin blog.csdn.net/u014665013/article/details/119856043