My Tencent looksalike problem-solving ideas

Question: https://pan.baidu.com/s/1Re0k81XieiXFI6hkwgL8oA

analyze:

1. There are multiple users and multiple advertisements. One user may act on multiple advertisements, and one advertisement may also be clicked by multiple users, which is obviously not easy to handle. We assume that there is only one advertisement, then he There are only two cases for users, clicked and not clicked, which becomes a binary classification problem. We splicing user features and advertisement features as x, and whether the click corresponding to x is used as y, for discrete data For classification, you can use knn, support vector machine, decision tree/random forest, booter.. For discrete data. It is best to do sparse processing before using these methods. Our most commonly used method is onehot. It is introduced in another blog post. The advantages and disadvantages of this method, I think w2v or autoencoding can be used to process some relatively large data, such as interest1. Our data has four g, which will become larger after sparse processing, and the data can be split first, I'll share the code for the sequential split.

2. I have another idea, which is to sparse user features first and do kmeans clustering, so that the problem may be simpler

Data: you can contact me

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324812403&siteId=291194637