Briefly describe the application of some smoothing methods in CTR estimation

Source: https://cloud.tencent.com/developer/article/1005257

In the evaluation of online advertising delivery indicators, CTR (click-through rate) is one of the many effective evaluation methods, and CTR prediction is also a popular field of data mining. In the SPA competition held by Tencent TSA, the prediction of mobile APP The conversion rate of advertisements also has a certain correlation, so the methods used by predecessors in predicting CTR are also worthy of reference and learning in this competition. The smoothing of CTR is one of these methods, and in the preliminary practice, it was found that after smoothing, the score improved between 0.0005 and 0.002 compared with unsmoothed processing (the difference here is the statistical method, There are also parameter settings, etc., the author has no special grasp, adding it can have such an improvement) The following article will be divided into three aspects: 1. Why should smoothing be added 2. The relevant details are introduced 3. For different days, you can extra processing done.

1. Why add smoothing

First of all, we often add an advertisement ID or the past conversion rate of users, etc. as a feature when making CTR prediction, and this feature often occupies a large weight in the final training, but the simple calculation of the conversion rate is often larger. variance of. For example: Ad A was seen 200 times in the past, but was converted 4 times, and the final conversion rate was 2%. Ad B was clicked 10 times, converted 0 times, and the conversion rate was 0%. Can we get The conversion rate of A is higher than that of B. I think the stability here is very low.

And it will appear from time to time, and new advertisements that appear in the near future need to be predicted, and the historical frequency of such advertisements is very low. At this time, it is necessary to smooth the advertisements with few clicks, reduce the noise of low clicks, and avoid causing a greater impact on the data with many samples.

Second, the relevant details

(1)Add-Lambda Smoothing

First introduce the simplest Smoothing method,

, which adds lambda (such as 0.001, 1, 10, etc.) to the numerator and denominator, so as to avoid the above-mentioned, because it has not been clicked and may incorrectly estimate its click-through rate as 0%.

However, Add-One Smoothing also has its drawbacks. Even after adding 1, because of the small number of samples, the calculated click-through rate still has a large variance.

(2)Additive smoothing and generalized to the case of known incidence rates

When we have a better conversion rate as a prior on other features besides this feature, we can add this information to our smoothing method,

μ = (μ1, …, μd) is the corresponding conversion rate on other features. For example, in the TPA competition, because the feature dimension of connectionType is low and the number of samples is large enough, we can use the conversion rate of connectionType as our μ, so that our smoothed conversion rate has lower noise, avoids overfitting and conforms to the real situation, And this method is also learned by the author in Owenzhang's solution on kaggle avazu, readers can go to for in-depth study. But there is still a problem here, how much lambda should be set here, and I don't have a good solution (if readers have better ideas, welcome to exchange), maybe this is also the place where parameters need to be adjusted, which will cost a lot of money time.

3. Additional processing that can be done for different days

First of all, when we count the conversion rate of the previous days, most of the time we look at the number of days uniformly, and each day has the same weight in the statistics of the conversion rate. In reality, the conversion rate of the previous day is compared to the previous day The conversion rate of one day has higher credibility. At this time, we can set a certain weight for each day, increase the weight of the recent conversion rate, and reduce the weight of the conversion rate in the distant time, so as to make the constructed features more predictable. reliability. And this method is a specific method seen in a paper about CTR issued by Yahoo lab , you can go to check it.

(PS: The improvement obtained by the author using this method is more effective than the improvement obtained by the above method)

Finally, because I have just come into contact with the CTR competition, some of the above opinions may be biased. Therefore, if there is any discovery, I hope to point it out, and hope that in the competition, I can make progress with my partners.

Reference Link:

1.http://cs229.stanford.edu/notes/cs229-notes2.pdf2.https://www.cs.jhu.edu/~jason/465/PowerPoint/lect05-smoothing.ppt3.http://www.cs.cmu.edu/~xuerui/papers/ctr.pdf4.https://github.com/owenzhang/kaggle-avazu

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325579625&siteId=291194637