xgboost basic usage (reprint) sklearn integrated learning in

Reprinted from: https: //blog.csdn.net/qq_30868235/article/details/80370060

1. Data Set

      Dataset sklearn own handwritten numeral recognition data set mnist, introduced through the function datasets. mnist total of 1797 samples, 8 * 8 wherein, the tag numbers from 0 to 9 ten.

### load data
from sklearn import datasets # Loading data set
digits = datasets.load_digits () # Loading mnist data set
print (digits.data.shape) # prints the input spatial dimension
print (digits.target.shape) # printout spatial dimensions

 

"""
(1797, 64)
(1797,)
"""

2. Data Set Partitioning

      sklearn.model_selection train_test_split functions in the divided data set, wherein the parameter is the ratio test set test_size occupied, random_state random seed (to be able to reproduce the experimental results set).

Data division ###
from sklearn.model_selection import train_test_split # Loading data dividing function train_test_split
x_train, x_test, y_train, android.permission.FACTOR. = Train_test_split (digits.data, # feature space
digits.target, # output space
test_size = 0.3, # accounted test set % 30
random_state = 33 is) # for reproducing experiment, a random number provided

 

3. The relevant model (model loaded - training model - model predictions)

      XGBClassifier.fit () function is used to train the model, XGBClassifier.predict () function to use the model to make predictions.

### model relevant
from xgboost Import XGBClassifier
Model = XGBClassifier () # load the model (model named Model)
model.fit (x_train, y_train) # training model (training set)
y_pred = model.predict (x_test) # model predictions (test set), y_pred prediction results

4. Performance Evaluation

      sklearn.metrics accuracy_score function used to determine the accuracy of the model predictions.

### performance metrics

from sklearn.metrics import accuracy_score # 准确率
accuracy = accuracy_score(y_test,y_pred)
print("accuarcy: %.2f%%" % (accuracy*100.0))

5. The importance of features

      xgboost analyzes the importance of features, draw pictures by function plot_importance.

### 特征重要性
import matplotlib.pyplot as plt
from xgboost import plot_importance
fig,ax = plt.subplots(figsize=(10,15))
plot_importance(model,height=0.5,max_num_features=64,ax=ax)
plt.show()

image

6. complete code

### load module
from sklearn import datasets
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score

### load datasets
digits = datasets.load_digits()

### data analysis
print(digits.data.shape) # 输入空间维度
print(digits.target.shape) # 输出空间维度

### data split
x_train,x_test,y_train,y_test = train_test_split(digits.data,
digits.target,
test_size = 0.3,
random_state = 33)

### fit model for train data
model = XGBClassifier()
model.fit(x_train,y_train)

### make prediction for test data
y_pred = model.predict(x_test)

### model evaluate
accuracy = accuracy_score(y_test,y_pred)
print("accuarcy: %.2f%%" % (accuracy*100.0))
"""
95.0%
"""

Guess you like

Origin www.cnblogs.com/xitingxie/p/11323114.html