LOL Leveling Tencent game --2019 detection security technology competition preliminary round record

Tencent because you want to go to practice, so participated in this year's Tencent game security technology race, won this game there will be a practice green channel. Choose the direction of the data analysis inside the machine learning.

Preliminary questions were very interesting, Leveling detected on LOL (League) is.

Moba common game 5 v 5 mode, the player with the remaining nine players will together constitute the game. 5 people both players each, both team to victory by priority down enemy crystal. Mono mode qualifying means: Players can choose to participate in qualifying or in the form of a double team to participate in qualifying, race results will have an impact on their own Dan.

We offer a money game in Moba 2019.03.07 account the same day some of the players, the training set list has been marked whether Leveling account, unlabeled 10,000 accounts as a test set, and we provide 10 days ago these 2019.02.26-2019.03.07 players qualifying data, the players need to focus on test day forecast 2019.03.07 account the existence of leveling behavior based on historical game performance.

Data Download: https://gslab.qq.com/html/competition/20190311/index.htm

I personally liked to play LOL, so do not feel it more interesting, boil two days and nights to finish up the last 86% accuracy rate, recall rate 73%, score 82

First seen txt data files, up to 500W line only, maximum data file size of about 1G, each load up too slow, so I read with pandas turned into numpy matrix type direct deposit then became npy file, npy access are binary, and then load the speed will be greatly enhanced.

And record the behavior of water flowing in the data file has a lot of players all game since the 10 days, I first screened each player, and then calculate each variable. Here is variable based on the data and documents provided according to my own understanding of the League be selected.

 

 

32 were extracted characteristic (variable)
   

It features variable directory

1

The proportion of single and double ranking
2 The most commonly used accounting hero
3 The average duration of the game
4

Seconds to kill all

5 Seconds were dead
6 Second in assists
7 Average per game in the biggest game Liansha
8 Average game average
9 Wins
10 (Unyielding) accounted for the highest evaluation score loser than defeat
11 The other in proportion to surrender victory in
12 Grow up to 25 minutes accounted for the game
13 Carry innings victory in the proportion of
14 Bureau defeat the proportion of abused
15 Try to share in the victory of the Bureau
16 RCC Board in proportion to seal the victory in
17 Lay a win to seal the victory in the proportion of
18 The largest proportion of homicide
19 The largest share of money
20 Incredible Liansha share (it should share God is super)
21 MVP accounted for more than victory
22 Wang accounted creeps
23 Accounting for escape
24 Hero highest proportion of injuries
25 Seconds were hurt
26 S have received damage
27 Seconds have received gold
28 Averaging largest Liansha
29 Averaged maximum kill
30

Averaging up the knife. If the playing field is calculated seconds are creeps * 1.4 + soldier, if not the playing field, are calculated in seconds soldier. Whether by carrying punishment in judgment is not the playing field

31

KDA

32 The number of game play

These features are extracted, it becomes a line 17W, 17W players samples i.e., 32, 32 is characteristic variable matrix. Players catalog and through training and test sets of 17W players to separate (a total of 1W test samples).

After I used Sklearn provided directly SVM model, and the training set by 6: 4 split into training and validation sets. In the absence of any treatment accuracy rate is about 90%, 0% to 40% recall rate have appeared very unstable.  

Our ranking is based on Score=\frac{4PR}{3P+R}, where P is the accuracy, R is recall. My score is very unstable and very low. I found some say, the training time as possible to normalize the sklearn document inside. I tried it, really wonders.

scaler = StandardScaler()
scaler.fit(training_x)
trans_training_x = scaler.transform(training_x)

scaler.fit(test_x)
trans_test_x=scaler.transform(test_x)

Above is the normalization of the code, we must pay attention not only on the training set is normalized, but also on the test set is normalized. Otherwise, the overall result would be meaningless.

After normalization my recall rate stabilized at 40%, 90% accuracy rate. The gap between precision and recall rate is too high, like this final score is also very low. I think this is a very uneven sample category of training. Leveling the players themselves because only a small number, so the final SVM put the player in this is Leveling, but a wrong judgment as to improve the player's normal punishment.

Finally, I chose the following parameters (1 Leveling on behalf of the players, 0 for normal players):

weight_dict=dict()
weight_dict[1]=4
weight_dict[0]=float(4)/7

clf=SVC(class_weight=weight_dict,
        C=1.0,
        kernel='rbf',
        gamma='scale'
        )

其实class_weight参数中有一个很现成的选项是"balanced",在这种情况下C[i]=n_samples / (n_classes * np.bincount(y))。具体请见sklearn关于svm的文档:https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

其实在大多数非平衡情况下class_weight=”balanced“就可以解决问题。但在本题score的定义下,我自己尝试了多种weight_dict,最后还是4和4/7得到的score是最高的。最后得到了准确率86%,召回率73%,分数有82,进入了决赛。问到一个人说他的分数有87,不知道怎么办到的。

进决赛了很开心,就当是免费去深圳旅游了。这次初赛这个项目我本身是非常喜欢的,说到遗憾就是没有加让玩家rank分数这个特征,还有数据预处理阶段写了太多不必要的操作,导致在特征提取过程中程序运行了14个小时。我想优点就是能够有诸如KDA、打野和线上补兵调整这些不玩LOL的人不会有的数据处理操作,当然也使得最终结果有一定提升。

今天腾讯HR打电话来说因为我之前参加了腾讯面试,所以这次所谓的面试绿色通道对我没有用……就相当于这个面试通道只是可以提供一个立即初试的机会,然而我之前已经面过初试。 ORZ  突然就不知道到底我是为了什么要参赛了,有点沮丧。

我真的没有想到过要去一个实习会有这么难。什么时候才能到各家公司直接来找我呢?

发布了9 篇原创文章 · 获赞 11 · 访问量 1万+

Guess you like

Origin blog.csdn.net/Code_Tookie/article/details/88819298