Sorting and recall recommendation index summary of recommendation algorithm

foreword

Recently, I thought of summarizing the commonly used recommendation indicators. I think this is also the most basic knowledge point. However, this is not very complete, just a summary of some commonly used ones that I think are basic, and I will supplement them when I encounter other ones in the task.

For example, we often hear AUC, MAP (Mean Average Precision), HR (Hit Ratio), NDCG (Normalized Discounted Cumulative Gain), etc.

1. ROC and AUC

These contents are not sorted out here. These are mainly to evaluate the performance of the model in the binary classification problem. I have already sorted out a very detailed article before. The most important thing here is AUC. This calculation must be Yes, the calculation code must also be hand-rolled. For details, please refer to:

Detailed explanation of ROC and AUC of recommendation system

Through these two articles, you should be able to pick up the commonly used evaluation indicators and loss functions in ML. The focus of the classification here is AUC, which is generally required for this interview. Let's start to introduce some other indicators. The following are mainly the evaluation of the recommended list topK.

2、 Hit Ratio(HR)

In top-k, HR is a commonly used way to measure the recall rate, and it is also an easy-to-understand sorting method. First look at the calculation formula:

 The denominator is all test sets, and the numerator represents the sum of the numbers belonging to the test set in each user's top-K list. To give a simple example, the number of products for three users in the test set is 10, 12, and 8 respectively. In the top-10 recommendation list obtained by the model, there are 6, 5, and 4 items in the test set, then this The value of HR at this time is (10+12+8)/(6+5+4)​=0.5.

3、Mean Average Precision(MAP)

I have already written about the precision rate and recall rate, and I also summarized it at the end of the article. You can go and have a look.

Detailed explanation of target detection index mAP

4、Normalized Discounted Cummulative Gain(NDCG)

The abbreviation is a bit difficult to remember. The translated Chinese is: normalized loss cumulative gain. It's a mouthful and hard to remember. But it's easier to remember a lot when you split it step by step.

1. Cumulative gain (CG)

For a user, the model will make a prediction for the user's items. Assuming that there are p items predicted by the current model, then CG will add the p prediction scores, which is the cumulative gain of the current user:

 The above is CG, which is very simple, it is the sum of relevant scores, but there is a problem that the position of the item is not considered. For example, the ideal sorting result is B3, B2, B1, so my final recommendation list is B1, B3, B2. For CG, it's the same.

2. Deducted Cumulative Gain (DCG)

In order to solve the impact of each recommendation result, we always hope that the ones with high relevance to users will be ranked in the front, and the ones with low correlation will be ranked in the back. Obviously, if the results with low correlation are ranked in the front position, it will seriously affect Therefore, on the basis of CG, the influence factor of position is introduced , that is, DCG (Discounted Cumulative Gain), which refers to the "discount processing" of the recommendation effect of the recommendation results with lower rankings. The further back the sort, the lower the value.

To understand this place, it is to weight the position. The weight size is that the lower the ranking, the smaller the weight. Then the weighted CG is:

 

This is DCG, weighted on the basis of CG, two conclusions:

        The greater the correlation of the recommendation results, the greater the DCG.
        If the good correlation is ranked in the front of the recommendation list, the better the recommendation effect and the greater the DCG.

As for NDCG, it actually wants to make a horizontal evaluation between different user recommendation lists. Because the above DCG and CG are currently based on one user, it seems that the calculation is very simple, but what if the results of multiple users are compared? Or, how to get the final average? This cannot be directly averaged by DCG, because the entire sorted list is used in the calculation process in DCG, and this sorted list, obviously, is different for each user, so it cannot be simply added and divided by the total number of users equally, so NDCG The meaning is to normalize the DCG of each user first, and then calculate the average for all users. To understand it simply, when calculating the average DCG of each user, a weight needs to be added. The weighted DCG is NDCG.

3. Normalized Discount Cumulative Gain (NDCG)

DCG still has shortcomings, that is, it is difficult to conduct horizontal evaluation between different recommended recommendation lists, and it is impossible for us to evaluate a recommendation system using only one user's recommendation list and corresponding results for evaluation, but the entire test The users in the machine and their recommendation list results are evaluated . Then the evaluation scores of the recommendation lists of different users need to be normalized, because each user recommends different items for the model, so we need to think of a way Compare them equally according to their own characteristics. At this time, you can think of 'normalization' so that you can compare them horizontally.

Before introducing NDCG, you need to know another concept, IDCG (Ideal DCG), which refers to the list of the best recommended results returned by the recommendation system for a certain user, that is, assuming that the returned results are sorted by relevance, and the most relevant results are placed first, this sequence The DCG is IDCG. Therefore, the value of DCG is between (0,IDCG], so the value of NDCG is between (0,1]. NDCG calculation formula:

IDCG is the maximum DCG value under ideal conditions:

Among them, ∣ REL ∣ |REL|∣REL∣ indicates that the results are sorted in descending order of relevance, and the set consisting of the first p results is sorted in the optimal way.

This is basically the end of writing here. It is definitely not complete, but I think it is enough for common use. I hope it can help everyone and have an intuitive understanding of the common evaluation indicators in ML, DL, and article recommendations. (^_-)

Reference: Rolling Xiaoqiang
 

 

 

Guess you like

Origin blog.csdn.net/qq_38375203/article/details/125911017