PUBG_Mobile: Random Forest + Logistic Regression

project description:

The PlayerUnknown’s Battlegrounds game data on kaggle has a total of 4,446,966 pieces of data and a total of 47,965 games. The player ID is not clearly marked and the number of participants is unknown.

Analysis visualization ideas:

WeChat screenshot_20230517222903.png

Data Dictionary:

WeChat picture_20230517224222.png

Load data and view the data situation

data = pd.read_csv(r'.\PUBG_Mobile\data\train_V2.csv')
data.describe()
data.info()

WeChat screenshot_20230517223225.png
There are 29 fields in total, with only one missing value.
Possible cheating data and outliers are eliminated.

 # 剔除可能开挂的数据,只有一条空数据,直接删除
data.dropna(inplace=True)
# 杀敌数大于20
df1 = data[data.DBNOs<=20]
# 剔除在车上杀敌大于3人
df2 = df1[df1.roadKills<=3]
# 没移动就完成击杀
df3 = df2[~((df2.walkDistance==0)&(df2.DBNOs>0))]
# 剔除杀敌数大于3且爆头率为1的数据
data_ed = df3[~((df3.kills>3)&(df3.kills==df3.headshotKills))]
# 玩家ID没标示
print(len(data_ed),data_ed['Id'].nunique(),data_ed.matchId.nunique())

Specific analysis ideas, from distribution → ranking → chicken
game 1. In a game, the damage suffered by the player himself

fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_figwidth(15)
sns.distplot(data_ed['damageDealt'], ax=ax1)
sns.boxplot(data_ed['damageDealt'], ax=ax2)
plt.show()

output_12_0.png

As can be seen from the picture above, the average player receives 0-500 damage in a game.
2. Distribution of the number of people knocked down

plt.figure(dpi=300,figsize=(24,8))
plt.hist(data_ed.DBNOs)
plt.show()

output_7_0.png

Haha, most people are very kind and have never knocked down one person
3. The relationship between the number of kills and player rankings

# 击倒人数与当场游戏排名的关系
plt.figure(figsize= (24, 8),dpi=300)
df4 = data_ed[['DBNOs', 'winPlacePerc']]
sns.set(style="darkgrid")
g = sns.relplot(data=df4,x="DBNOs", y="winPlacePerc",height=8,linewidth=2,aspect=1.3, kind="line")
plt.title('DBNOs / winPlacePerc', fontsize=15)
g.fig.autofmt_xdate()

output_11_1.png

4. Number of knockouts and game rankings

# 单变量分析:击杀人数与玩家排名的关系
df4 = data_ed[['kills', 'rankPoints']]
plt.figure(figsize= (30, 10))
sns.set(style="darkgrid")
g = sns.relplot(data=df4,x="kills", y="rankPoints",height=8,linewidth=2,aspect=1.3, kind="line")
g.fig.autofmt_xdate()

output_9_1.png
The ELo score is 1000 as the midpoint. If the score reaches more than 1000, the number of kills at the same time must exceed 30 people.
5. The winning probability of each team mode (single row/double row/four rows)

# 查看每种组队模式的获胜概率(单排/双排/四排)
df_matchType_no1 = data_ed[data_ed.winPlacePerc==1].groupby(['matchType']).agg('matchType','count')
df_matchType = data_ed.groupby(['matchType']).agg('matchType','count')
df_matchType_win = pd.merge(df_matchType,df_matchType_no1,left_index=True, right_index=True)
df_matchType_win['胜率'] = df_matchType_win['count']/df_matchType_win[count']
plt.figure(dpi=300,figsize=(24,8))
plt.bar(df_matchType_win.index,df_matchType_win['胜率'])
plt.xticks(rotation=30)
plt.show()

output_14_0.png
Judging from the results, the probability of eating chicken in the fourth row is the highest 1.4%
. 6. The relationship between walking distance and eating chicken

# 用步行距离与吃鸡的关系walkDistance  /winPlacePerc
df_ride = data_ed[['walkDistance', 'winPlacePerc']]
labels=["0k-1k", "1k-2k", "2k-3k", "3k-4k","4k-5k", "5k-6k", "6k-7k", "7k-8k"]

df_ride['walkDistance_cut'] = pd.cut(df_ride['walkDistance'], 8, labels=labels) # pd.cut , 分割pandas 为10个等距子表
df_ride.groupby('walkDistance_cut').winPlacePerc.mean().plot.bar(rot=30, figsize=(24, 8))
plt.xlabel("walkDistance_cut")
plt.ylabel("winPlacePerc")

output_17_1.png
7. The relationship between the distance the vehicle moves and chicken eating

# 用载具移动的距离与吃鸡的关系rideDistance /winPlacePerc 
df_ride = data_ed.loc[data_ed['rideDistance']<10000, ['rideDistance', 'winPlacePerc']]
labels=["0k-1k", "1k-2k", "2k-3k", "3k-4k","4k-5k", "5k-6k", "6k-7k", "7k-8k"]

df_ride['drive'] = pd.cut(df_ride['rideDistance'], 8, labels=labels) # pd.cut , 分割pandas 为10个等距子表
df_ride.groupby('drive').winPlacePerc.mean().plot.bar(rot=30, figsize=(24, 8))
plt.xlabel("rideDistance")
plt.ylabel("winPlacePerc")

output_16_1.png
8. The relationship between gain items and chicken eating

# 用增益物品与吃鸡的关系boosts/winPlacePerc
df4 = data_ed[['boosts', 'winPlacePerc']]
plt.figure(figsize= (30, 10))
sns.set(style="darkgrid")
g = sns.relplot(data=df4,x="boosts", y="winPlacePerc",height=8,linewidth=2,aspect=1.3, kind="line")
g.fig.autofmt_xdate()

output_18_1.png
multivariate correlation

#删除与建模无关的字段Id groupId matchId matchType
data_m = data.drop(['Id', 'groupId', 'matchId', 'matchType'],axis=1)
matrix = data_m.corr()
cmap = sns.diverging_palette(250, 15, s=70, l=75, n=40, center="light", as_cmap=True)
plt.figure(figsize=(24, 12)) 
sns.heatmap(matrix,  center=0, annot=True,fmt='.2f', square=True, cmap=cmap)

output_20_1.png

Starting from winplaceperc, the correlation is relatively strong, the player's walking distance, the number of buff items used, and the number of players killed are negatively correlated. Divide
the data set

y = data_m['winPlacePerc'].values
x = data_m.drop(columns=['winPlacePerc']).values
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size=0.3)

linear regression

# 线性回归
reg = LR().fit(xtrain,ytrain)
y_hat = reg.predict(xtest)

random forest

# 随机森林
rfc = RandomForestClassifier(random_state=0)
rfc = rfc.fit(xtrain,ytrain.astype('int64'))
rfc_y_hat = rfc.predict(xtest)
# score_r = rfc.score(xtest,ytest.astype('int64'))

RMSE, MSE, R-squared and MAE are used to evaluate the accuracy of the regression model.

# 线性回归
MSE = metrics.mean_squared_error(ytest, y_hat)
RMSE = metrics.mean_squared_error(ytest, y_hat)**0.5
MAE = metrics.mean_absolute_error(ytest, y_hat)
MSE,RMSE,MAE,

mse=0.016028860503889776, rmse=0.126605136167099378,mae=0.09272709032057316

#随机森林
MSE = metrics.mean_squared_error(ytest, rfc_y_hat)
RMSE = metrics.mean_squared_error(ytest, rfc_y_hat)**0.5
MAE = metrics.mean_absolute_error(ytest, rfc_y_hat)
MSE,RMSE,MAE,

mse=0.014725708056613685,rmse=0.12134952845649498, mae=0.08928706404803585

Learn from

https://codeantenna.com/a/Rn2nLom4jT
https://www.jianshu.com/p/57c0f0266c10
https://www.heywhale.com/mw/project/63f19d69030c7011ddd54ab7

Guess you like

Origin blog.csdn.net/weixin_43502706/article/details/130777766