Simple use of xgb (feature selection, importance image drawing, classification, prediction)

You can often see xgb in competitions such as kaggle. In 2016, Chen Tianqi formally proposed the algorithm in the paper "XGBoost: A Scalable Tree Boosting System". The basic idea of ​​XGBoost is the same as that of GBDT, but some optimizations have been made, such as the second derivative to make the loss function more accurate; the regular term to avoid tree overfitting; block storage can be calculated in parallel, etc. XGBoost has the characteristics of high efficiency, flexibility and portability, and has been widely used in data mining, recommendation systems and other fields. Here is a brief summary of commonly used codes.
Assume that xgb has been downloaded and train_x, train_y and test_x, test_y are ready

Classification

import xgboost as xgb
from xgboost import XGBClassifier
from matplotlib import pyplot as plt

model = XGBClassifier()
model.fit(train_x, train_y )

# feature importance
print(model.feature_importances_)

'''
plot_importance 与 feature_importances 可能会出现不一致
这是因为model.feature_importances_的重要性排名默认使用gain,而xgb.plot_importance默认使用weight。
改一下就一样了
plot_importance(model,importance_type='gain')
'''

# plot feature importance
plot_importance(model)
plt.show()

# 预测
y_pred = model.predict(test_x)

predict

import xgboost as xgb
from xgboost import plot_importance

model = xgb.XGBRegressor(max_depth=6, # 可以调节这些参数来改进模型效果
			learning_rate=0.12, 
			n_estimators=90, 
			min_child_weight=6, 
			objective="reg:gamma")
model.fit(x_train, y_train)

Feature Importance Image Resizing

import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt

fig,ax = plt.subplots(figsize=(10,6))# 调节图像尺寸
plot_importance(model,
                height=0.6,# 调节线宽
                ax = ax,
                max_num_features=64)#调节显示数目
plt.show()

Feature importance Chinese display

import xgboost as xgb
from xgboost import plot_importance

# model = xgb.XGBRegressor()  # sklearn接口
# model.fit(xgb_trainx, xgb_trainy)

# 绘图显示中文
#newdf.columns[3:]
feature_names = list(newdf.columns[3:])	# 拿到所有的特征
# 原生接口
dtrain = xgb.DMatrix(xgb_trainx, label=xgb_trainy, feature_names=feature_names)
param = {
    
    }
model = xgb.train(param, dtrain)

fig,ax = plt.subplots(figsize=(10,6))
plot_importance(model,
                height=0.6,# 调节线宽
                #ylabel=ttylab,
                ax = ax,
                max_num_features=10,#调节显示数目
               importance_type='gain')
plt.show()  # 挑选前3个特征

Guess you like

Origin blog.csdn.net/qq_44391957/article/details/124419772