Huawei Global Campus AI Algorithm Competition (Recommended Direction): Advertisement-information flow cross-domain CTR estimation
Article directory
1. Competition tasks
Introduction to the competition
Ad recommendation is mainly based on the modeling of users' historical exposure and clicks on advertisements. If only advertising domain data is used, user behavior data is sparse and the behavior types are relatively single. The introduction of cross-domain data from the same media can obtain the behavior data of the same advertising user in other domains, deeply mine user interests, and enrich user behavior characteristics. Introducing advertising user behavior data from other media can also enrich user and advertising characteristics.
This question hopes that contestants can optimize the accuracy of advertising ctr prediction based on advertising log data, user basic information and cross-domain data. The target domain is the advertising domain, and the source domain is the information flow recommendation domain. By obtaining behavioral data such as user exposure in the information flow domain and clicking on the information flow, user interest modeling is performed to help accurately predict the ctr of the advertising domain.
Evaluation index: weighted sum of GAUC and AUC
Description
This question provides 7 days of data for training and 1 day of data for testing. The data includes target domain (advertising domain) user behavior logs, user basic information, creative information, source domain (information flow domain) user behavior data, source domain (Information basin) Basic information of items, etc.
It is hoped that the contestants can identify and generate user behavior characteristics in the source domain that can reflect user interests based on the given data, and can be applied to the target domain. The click-through rate for the ad domain. The provided data has been desensitized to ensure data security.
the data shows
The provided data includes user behavior data in the target domain and user behavior data in the source domain, which are described below according to these two dimensions.
Target Domain User Behavior Data
Field Name | field meaning | Is it nullable | Field Type | Value example |
---|---|---|---|---|
label | Whether to click, 0: no, 1: yes | no | int | 0,1 |
user_id | user id | no | String | 1,2… |
age | age | yes | String | 1,2,3… |
gender | gender | yes | String | 1,2… |
residence | Residence - Province | yes | String | 1,2… |
city | Permanent residence-city-number | yes | String | 1,2… |
city_rank | Permanent residence-city-level | yes | String | 1,2… |
Source Domain User Behavior Data
Field Name | field meaning | Is it nullable | Field Type | Value example |
---|---|---|---|---|
u_userId | User ID | no | String | 0001 |
u_phonePrice | User mobile phone price | yes | String | 13 |
u_browserLifeCycle | browser user activity | yes | String | 10 |
u_browserMode | Browser business type | yes | String | 11 |
u_feedLifeCycle | information flow user activity | yes | String | 12 |
u_refreshTimes | Daily effective refresh times of information flow | yes | String | 16 |
u_newsCatInterests | Information flow graphic click classification preference | yes | [String,] | [1^2…] |
… | … | … | … | … |
upload files
Today (00:00:00 UTC ~ 23:59:59 UTC) of the selection contest has been submitted 0 times, with a maximum of 4 submissions
-
Submit [DIGIX Implementation Instruction ](https://digix-algo-challenge.obs.cn-east-2.myhuaweicloud.com/2020/DIGIX Implementation Instruction - [Team].docx) and Source Code at least once
-
For the first upload, three files need to be uploaded at the same time. The file formats are as follows: submission.csv, DIGIX Implementation Instruction.docx, and Source Code.zip. (The file size of DIGIX Implementation Instruction.docx should not exceed 1M, the compressed file size of Source Code.zip should not exceed 5M, and the size of submission.csv should not exceed 100M.)
Evaluation method
Evaluation method: Calculate the sample ctr estimated value of the advertising domain, and calculate GAUC and AUC
Evaluation index: This competition uses the weighted sum of GAUC and AUC as the evaluation index. The specific formula is as follows:
xAUC = α*
GAUC + β*
AUC
The higher the xAUC, the better the result and the higher the ranking.
Among them, AUC is the AUC statistics of the entire sample, GAUC is the weighted sum of group AUC, grouped by users, and the group weight is the exposure within the group/total exposure)
Preliminary round: α is 0.7, β is 0.3
This question comes from an actual industrial scene, so the test set cannot use traversal information. For example, when constructing features, samples at time T are required to use information before time T to ensure the practical application value of the scheme. If the future information of time T is used in violation of the regulations, the results will be judged invalid when the retest after the preliminary round is over.
submission method
The result submitted by the contestants is a submission.csv file, encoded in UTF-8 without BOM, in the following format: log_id,pctr. Among them, log_id is the log_id in the corresponding test sample, pctr corresponds to the estimated ctr value calculated by the model of the test sample, and pctr retains 6 decimal places.
Refer to the following example for submitting files:
log_id,pctr
1, 0.002345
2, 0.010456
. . .