XGBoost Tutorial (for use with sklearn) two

 

 

Reprinted from: https: //blog.csdn.net/u011630575/article/details/79421053

A, import the necessary toolkit

Run # xgboost sample program installation package
from xgboost Import XGBClassifier

# LibSVM format data loading module
from sklearn.datasets Import load_svmlight_file
from sklearn.metrics Import accuracy_score

from matplotlib Import pyplot
two, the data read

scikit-learn supports multiple data formats, format data including LibSVM
XGBoost libsvm can load the text data format, file format libsvm (sparse feature) as follows:
1101: 102 1.2: 0.03
01: 2.1 10 001: 30010002: 400
...
each row represents a sample of the beginning of the first line of "1" is the label of the sample. "101" and "102" is characterized by an index, '1.2' and '0.03' is the characteristic value.
In the two types of classification, with "1" indicates a positive sample, with "0" indicates negative samples. Also supports [0,1] represents the probability used to make labels, expressed as the probability of a positive sample.
The following sample data by a number of attributes we need to determine whether the number of varieties of mushrooms poisonous.
UCI data Description: http: //archive.ics.uci.edu/ml/machine-learning-databases/mushroom/,
each sample description attributes 22 mushrooms, such as the shape, smell, etc. (processing format into libsvm into a 126-dimensional feature),
then give this whether edible mushrooms. Of which 6513 samples do training, 1611 samples for testing.

Data Download: http: //download.csdn.net/download/u011630575/10266113

# Read in data, the data in the directory path xgboost demo installed, now copy the directory data directory code
my_workpath = './data/'
X_train, y_train = load_svmlight_file (my_workpath + 'agaricus.txt.train')
X_test , android.permission.FACTOR. load_svmlight_file = (+ my_workpath 'agaricus.txt.test')

Print (X_train.shape)
Print (X_test.shape)
Third, the training parameters

max_depth: The maximum depth of the tree. The default value is 6, in the range: [. 1, ∞]
ETA: In order to prevent over-fitting, shrinkage used in the update process steps. Right after each lifting calculation algorithm will directly get the new features of weight.
eta cut right through the features of the heavy lifting calculation process more conservative. The default value is 0.3, the range of: [0,1]
Silent: indicates print out run-time information is set to 0, when run as a silent mode 1 taken not printed runtime information. The default value is 0
in Objective: learning tasks and define appropriate learning objectives, "binary: logistic" represents a logistic regression binary classification problem, the output probability.

Other parameters default values.
Fourth, the training model

# Set boosting iterations calculate
num_round = 2


BST = XGBClassifier (MAX_DEPTH = 2, learning_rate =. 1, n_estimators = num_round, 
                   Silent = True, Objective = 'binary: Logistic') #sklearn API


bst.fit (X_train, y_train)
XGBoost prediction the output is a probability. Here is a second-class classification mushroom classification problem, the output probability values are samples of the first category.

We need to convert the probability value is 0 or 1.

train_preds = bst.predict(X_train)
train_predictions = [round(value) for value in train_preds]

train_accuracy = accuracy_score(y_train, train_predictions)
print ("Train Accuary: %.2f%%" % (train_accuracy * 100.0))
五、测试

After a good model training, the trained model can be used to test data to predict
XGBoost predicted output is the probability that the output value is the probability that the sample is first class. We need to convert the probability value is 0 or 1.

# make prediction
preds = bst.predict(X_test)
predictions = [round(value) for value in preds]

test_accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))
六、代码整理

Coding #:. 8 UTF-
# xgboost installation package to run sample program
from xgboost Import XGBClassifier

# LibSVM format data loading module
from sklearn.datasets Import load_svmlight_file
from sklearn.metrics Import accuracy_score

from matplotlib Import pyplot

# in Read Data, the data installed xgboost demo the directory path to the data directory now copy the directory code
my_workpath = './data/'
X_train, y_train = load_svmlight_file (my_workpath + 'agaricus.txt.train')
X_test, android.permission.FACTOR. load_svmlight_file = (+ my_workpath 'Agaricus .txt.test ')

Print (X_train.shape)
Print (X_test.shape)

# set boosting iteration number
num_round = 2

#bst = XGBClassifier (** the params)
#bst = XGBClassifier ()
BST = XGBClassifier (MAX_DEPTH = 2, learning_rate = 1, n_estimators = num_round,
= True Silent, Objective = 'binary: Logistic')

bst.fit (X_train, y_train)

train_preds = bst.predict (X_train)
train_predictions = [round (value) for value in train_preds]

train_accuracy = accuracy_score (y_train, train_predictions)
Print ( "Train Accuary:% .2f %%"% (100.0 train_accuracy *))

# the make Prediction
preds = bst.predict (X_test)
Predictions = [round (value) for value in preds]

test_accuracy = accuracy_score (android.permission.FACTOR., Predictions)
Print ( "the Test Accuracy:% .2f %%"% (100.0 test_accuracy *))

"stumbled on a giant cow artificial intelligence course, everyone can not help but share the tutorial is not only a zero-based, user-friendly, and very funny. ! humor, like reading a novel, like that was amazing, so we share the point where you can jump to the tutorial. ".
---------------------
copyright: this article is the original article CSDN bloggers "Hehe Ming", following the CC 4.0 by-sa copyright agreement, reproduced, please attach the original source link and this statement.
Original link: https: //blog.csdn.net/u011630575/article/details/79421053

Guess you like

Origin www.cnblogs.com/xitingxie/p/11312036.html