Internet advertising click-through rate characteristics of engineering project

Transfer from http://blog.csdn.net/mytestmy/article/details/19088827

Disclaimer:

1) The blog is a great finishing cattle from the information online and selfless dedication of the experts. See specific information cited references. The statement also refer to the specific version of the original document

2) This article is for academic, non-commercial use. So every part of specific references and did not correspond in detail, but some part has always been directly copied from another blog over. If a section does not accidentally infringing on people's interests, but also look forgive and contact Laona deleted or modified until the stakeholders are satisfied.

3) I Caishuxueqian, finishing summed up when the inevitable mistakes, seniors also hope you feel free to correct me, thank you.

4) Read the article requires machine learning , statistical learning theory, optimization algorithms, etc. basis (if not it does not matter, do not look at, as with the students bragging of capital).

5) I have got word version and pdf version, if necessary can be uploaded for you to download csdn

 

One. Internet advertising feature works

Bowen " Overview of Internet advertising click-through rate system " discusses the system of Internet advertising click-through rate, you can see where the logistic regression model is relatively simple and practical, although it has a variety of training methods, but the goal is the same, training the results of the effect of the impact is relatively large, but the training method itself, affect the effectiveness is not decisive, because the training is the weight of each feature heavy weights subtle differences do not cause great changes in ctr.
After the training method of determining, for ctr estimates play a decisive role is optional feature.

 

1.1 Feature Selection and Use

CTR estimates do need data of two aspects, one is advertising data, on the other hand is the user's data, all the data have now, then the job is to use the data to evaluate the possibility of both the user clicks the ad (ie probability).
User feature is more, the user's age, sex, region, occupation, school, mobile platforms and so on. Features of advertising is also very rich, such as ad size, ad text, advertising your business, ad images. There are feedback features, such as real-time ctr each ad, ad sex with cross ctr. How to choose from so many features to be able to portray a person of interest in an advertising feature, it is a major problem of data mining engineer.
Selected characteristics, also need to pay attention to choose the way of features, for example, if the age of the individual as a feature, what it ultimately trained? Because of their age subtract the sum is meaningless, we can only put each age as a feature, but this light can it? How to use the feature, it is a major issue ad algorithm engineers.


1.1.1 Select feature

What kind of features suitable for the estimated ctr? The problem is a lot of advertising algorithm engineers need to consider.
We will talk about the most models of machine learning algorithms, feature discussion rarely involved. The real applications, most data mining engineers work is thinking, which provides verification feature.
Want to live feature is a plus mental strength, you need a lot of knowledge of the field, even more depressing it is that the industry did not think of ways to set characteristics, industry there is only way to verify features. For the Internet advertising industry, simply talk about how common features come of it.
First, that the age of this feature, how do you know it has a relationship with CTR? Now intuitive explanation is that young people generally like the kind of advertising campaign, about 30-year-old man like cars, houses and the like advertising, like people over 50 years of advertising of health care products. You can see, select age as a feature of the grounds is a rough division based on people of all ages like different types of things, is a very subjective thing.
Besides sex this feature, intuitive feeling is that men generally like the sports car class, tourist class advertising, women generally like cosmetics, clothing ads. This can also be seen as a feature of sex selection is based on similar grounds, is that men and women like big experience something different.
For this geographical feature, which under learning more, people in southern China prefer anime and games, North China's people like wine and cigarettes?
Characterized in advertising, the image size of the ad, the ad foreground background color can really affect people click on it? In fact, this is a speculation. Inside is a picture or a star like animal factor may also be considered.
In short, think of this feature basic thing is not much spectrum can only have something to imagine, even more knowledge of all walks of life, to think of more features, even if a feature does not matter much with people, they have a good validation some. This is basically with men as an excuse to want to come home late, have to have an excuse to think of how to explain it mildly, not an excuse and finding excuses.
Thought characteristics, it must be determined and verified.
Verification feature multi-way, direct observation ctr, chi-square test, one AUC and other features. Direct observation ctr is a very effective way, as according to the live recording, cosmetics ad click rate in women above click-through rate than men on many of the above, Sex in the cosmetics industry is characterized by the ability to predict; and if sports supplies CTR above in men is higher than women, sex this feature in the sports industry also has predictive power, proven multiple industries, we believe that gender this feature can be used.
Age characteristics of this type of evaluation, it is observed an advertisement Is there a difference in CTR across the various age groups, and then observe the ad click rate is not the same as the distribution of different ages, if there are differences, indicating that the age of this feature on It can be used.
In actual use we found that this feature is more effective gender, mobile platform this feature is also more effective, geographic and age these two features have some effect, but not so obvious the first two, with their use may be related, but also further excavation.
Meanwhile, the actual use also found that ad feedback ctr This feature is also very effective, this feature means that the current ad is running, has already put a part of, this hits part of the base can be considered as the CTR, and can also be considered a manifestation of the quality of the ad, used to estimate a traffic ctr is very effective.


1.1.2 Processing and features

Choose to get features, how to use is also a problem.
Let me talk about demand, in fact, estimated ctr to do is figure below work - ctr calculate a user / ad combinations.

The above features have been selected, there are tentative feedback ctr advertising, user age, sex three characteristics.
A discretization
feedback ctr is a floating-point number, characterized in that directly as possible, characterized in that a feedback is hypothesized ctr. Corresponds to the age is not the case, because age is not a floating-point number, 30-year-old and 20-year-old with two figures compare the size of the age of 20, 30 does not make sense, adding the subtraction is meaningless, calculated in optimization and the actual computing ctr would involve comparing the two figures of size. As wx, w has been determined in a case where the value of a feature x is 20 or 30, the value wx difference is great, even if a logical formula and then comparing the value obtained is relatively large, but often 20-year-old man with 30 years of age gap between the interest on the same ad will not be so big. Solution to this situation is that each age one feature, such as a total of only 20 to 29 years old 10 kinds of age, took every age to do a feature, numbered from 2-11 (1 ad feedback ctr) if the person is 20 years old, then the number 2 on the characteristics of the value is 0 is the number 1, 3 to 11. In this way, the age characteristics of this type will have 10 features, 10 feature and this is mutually exclusive, this feature is called discrete features.
Second, the cross
so it looks able to solve the above problem, but enough?
For example, a person is 20 years old, then the number is characterized by 2 above, it has been 1, ad basketball is one of cosmetics advertising is 1, the result of such training to get the number of weight 2 is the meaning of --20-year-old who clicks all the advertising possibilities are this weight, this is actually unreasonable.
Should be meaningful, this 20-year-old man, when it's time with sports-related advertising, which is a value; when the ad with related health care products, it is a value. So it looks reasonable. If this is not enough deep, with the same reason based on the above, this feature is the same sex, if also did a discrete operation above is number 12 and 13, 12 are men and 13 are women. In this case, for a male / sports advertisements combination, the features numbered 12 is 1, the combined number of male feature value / 12 is also a cosmetic. This is unreasonable.
How do reasonable? The example above is sex. No. 12 is the characteristic value does not take an argument for the male user CTR above, as described for the male / sports advertisement combination, the value of the number of male features 12 Physical CTR above, so numbered features 12 becomes a floating-point number, the floating-point addition and subtraction makes sense.
This approach is called cross features, now is the intersection of gender with advertising to get eigenvalues. There are many other ways to cross the current industrial applications is the most characteristic cross advertising with the user (for that characteristic number 1), with cross-gender characteristics of the ad, the ad features a cross with age, with mobile phone advertising cross-platform features, cross advertising with geographical features. If done more, there may be advertisers (a delivery plan for each ad is submitted by an advertiser, an advertiser may submit multiple launch plans) cross with various features.
Third, the continuous variable feature discrete features
do cross the characteristic value is enough? The answer is not necessarily.
As the number of features 1, ctr is advertising itself, assuming that Internet advertising click-through rate in line with a long tail distribution, called log-normal distribution, the probability density is lower diagram (Note that this assumption does not represent real data , the real data from observation is consistent with such a kind of a shape, the paper seems to have smoothed it in line with Yahoo's beta distribution).

You can see, most of the ads are click-through rate within a certain small range, the higher the CTR, the less traffic while covering these ads less. In other words, hits around 0.2% of the time, if the ad is a click-through rate of 0.2%, b advertising click-through rate was 0.25%, ad b CTR than ads a 0.05%, in fact, more than enough to represent ad b wide a lot better; but click-through rate of around 1.0% of the time, advertising a click-through rate is 1.0%, ad b click-through rate is 1.05%, and there is no way means that the ad b much better than advertising a, because this 0.05% Advertising is not much in the interval, two basic advertising can be considered the same. That is, hits at different intervals, you should consider a different weight coefficient, because the probability of the features and the user clicks on the ad that the CTR composed number is not entirely positive correlation, it is possible to value more the more important feature large, there may be value to grow to a certain extent, the importance declined.
For such problems, Baidu scientists proposed discretization of continuous features. They believe in the importance of continuous values characteristic of the different sections is not the same, so hopefully continuous feature in different sections have different weights, is the way to achieve the characteristic divided intervals, each interval is a new feature .
Specific implementation uses such as discrete frequency: 1) For the above number for that feature 1, the first statistical history of each record to show the sort of values characterized by number 1, assuming there are 10,000 impressions for each this record shows a feature value is not the same as a floating point number, for all feature values recorded in accordance with this float impressions from lowest to highest, lowest line 1000 shows recorded as a range, 1001 to 2000 show the ranking recorded as a characteristic value range, and so on, a total of 10 intervals divided. 2) rearrange the feature number for ranking from 1-1000 records show 1000, they numbered original features of the new features into a number 1, is 1; for the ranking is the record from 1001-2000, they numbered original features of the new features into number 2, a value of 1, and so on, there will be a number of new features 1-10 a total of 10. For each record shows, if ranked 1-1000, the new feature is only No. 2 to No. 1 in 10 to 0, other similar records show, so, ctr ad itself is occupied No. 10 features, it has become turned it into 10 discrete features.
And other discrete frequencies required for each feature of the original have done, that is, the original number is numbered 1-13, and will into many discrete number, if each discrete features into 10, then the final will feature 130 , w training result would be a 130-dimensional vector, weights 130 correspond to the feature weight.
The actual application of the table name, discrete features can approximate the nonlinear relationship data, to achieve better features than the original continuous effect, and the application on-line, no need to do multiplication, also accelerated the speed of computing ctr .


Filtered and the correction characteristics 1.1.3

As mentioned above, in fact, many features feedback features, such as ad feedback ctr, advertising and cross-gender characteristics that could have been demonstrated by the history of log statistics. But some ad had to show a very small, even less to show in the male users, this time to calculate advertising and gender cross ctr is very accurate, this feature needs to be corrected. Specific correction method can refer to the blog " Bayesian CTR smooth ."
After ctr corrected characteristics do, we have a relatively large effect of the actual online upgrade.
If you use a feature and more, with the school crossing with advertising features or something, after discretization with thousands of features, this case will produce a variety of problems caused by too many features, such as over-fitting and so on. One solution to this problem is to evaluate the data off-line, such as the use of discriminative ctr. Another is the use of regular weight vector, especially regular L1, L1 through regular training to get, some of which feature if the estimated predictive of click-through rate is not strong, the weight becomes 0, it does not affect the estimates. This is the filter characteristic, some specific discussion about the L1 and Implementation See Bowen " from the generalized linear model to logistic regression ," " OWL-QN algorithm " and " e-learning algorithm tRL ."

 

Acknowledgments

Many Linkedln, Baidu's researcher selfless public information.
Many bloggers blog information.

 

references

[1] Ad Click Prediction: a View from the Trenches. H. Brendan McMahan, Gary Holt et al,Google的论文
[2] http://www.cnblogs.com/vivounicorn/archive/2012/06/25/2561071.html @Leo Zhang的博客
[3] Computational Advertising: The LinkedIn Way. Deepak Agarwal, LinkedIn Corporation CIKM

Guess you like

Origin www.cnblogs.com/cmybky/p/11772875.html