220913_100620-Huawei Global Campus AI Algorithm Competition (recommended direction): Advertisement-information flow cross-domain CTR estimation

Huawei Global Campus AI Algorithm Competition (Recommended Direction): Advertisement-information flow cross-domain CTR estimation

1. Competition tasks

Introduction to the competition

Ad recommendation is mainly based on the modeling of users' historical exposure and clicks on advertisements. If only advertising domain data is used, user behavior data is sparse and the behavior types are relatively single. The introduction of cross-domain data from the same media can obtain the behavior data of the same advertising user in other domains, deeply mine user interests, and enrich user behavior characteristics. Introducing advertising user behavior data from other media can also enrich user and advertising characteristics.

This question hopes that contestants can optimize the accuracy of advertising ctr prediction based on advertising log data, user basic information and cross-domain data. The target domain is the advertising domain, and the source domain is the information flow recommendation domain. By obtaining behavioral data such as user exposure in the information flow domain and clicking on the information flow, user interest modeling is performed to help accurately predict the ctr of the advertising domain.

Evaluation index: weighted sum of GAUC and AUC

Description

This question provides 7 days of data for training and 1 day of data for testing. The data includes target domain (advertising domain) user behavior logs, user basic information, creative information, source domain (information flow domain) user behavior data, source domain (Information basin) Basic information of items, etc.

It is hoped that the contestants can identify and generate user behavior characteristics in the source domain that can reflect user interests based on the given data, and can be applied to the target domain. The click-through rate for the ad domain. The provided data has been desensitized to ensure data security.

the data shows

The provided data includes user behavior data in the target domain and user behavior data in the source domain, which are described below according to these two dimensions.

Target Domain User Behavior Data
Field Name field meaning Is it nullable Field Type Value example
label Whether to click, 0: no, 1: yes no int 0,1
user_id user id no String 1,2…
age age yes String 1,2,3…
gender gender yes String 1,2…
residence Residence - Province yes String 1,2…
city Permanent residence-city-number yes String 1,2…
city_rank Permanent residence-city-level yes String 1,2…
Source Domain User Behavior Data
Field Name field meaning Is it nullable Field Type Value example
u_userId User ID no String 0001
u_phonePrice User mobile phone price yes String 13
u_browserLifeCycle browser user activity yes String 10
u_browserMode Browser business type yes String 11
u_feedLifeCycle information flow user activity yes String 12
u_refreshTimes Daily effective refresh times of information flow yes String 16
u_newsCatInterests Information flow graphic click classification preference yes [String,] [1^2…]

upload files

Today (00:00:00 UTC ~ 23:59:59 UTC) of the selection contest has been submitted 0 times, with a maximum of 4 submissions

  1. Submit [DIGIX Implementation Instruction ](https://digix-algo-challenge.obs.cn-east-2.myhuaweicloud.com/2020/DIGIX Implementation Instruction - [Team].docx) and Source Code at least once

  2. For the first upload, three files need to be uploaded at the same time. The file formats are as follows: submission.csv, DIGIX Implementation Instruction.docx, and Source Code.zip. (The file size of DIGIX Implementation Instruction.docx should not exceed 1M, the compressed file size of Source Code.zip should not exceed 5M, and the size of submission.csv should not exceed 100M.)

Evaluation method

Evaluation method: Calculate the sample ctr estimated value of the advertising domain, and calculate GAUC and AUC

Evaluation index: This competition uses the weighted sum of GAUC and AUC as the evaluation index. The specific formula is as follows:

xAUC = α*GAUC + β*AUC

The higher the xAUC, the better the result and the higher the ranking.

Among them, AUC is the AUC statistics of the entire sample, GAUC is the weighted sum of group AUC, grouped by users, and the group weight is the exposure within the group/total exposure)

img

Preliminary round: α is 0.7, β is 0.3

This question comes from an actual industrial scene, so the test set cannot use traversal information. For example, when constructing features, samples at time T are required to use information before time T to ensure the practical application value of the scheme. If the future information of time T is used in violation of the regulations, the results will be judged invalid when the retest after the preliminary round is over.

submission method

The result submitted by the contestants is a submission.csv file, encoded in UTF-8 without BOM, in the following format: log_id,pctr. Among them, log_id is the log_id in the corresponding test sample, pctr corresponds to the estimated ctr value calculated by the model of the test sample, and pctr retains 6 decimal places.

Refer to the following example for submitting files:

log_id,pctr

1, 0.002345

2, 0.010456

. . .

Guess you like

Origin blog.csdn.net/liluo_2951121599/article/details/126827892