当前文档很不错了: https://xgboost.readthedocs.io/en/latest/build.html
1、下载源码
一行命令搞定,下载的源码在当前文件夹下,会创建一个xgboost目录
git clone --recursive https://github.com/dmlc/xgboost
2、编译so
cd xgboost; make -j4
3、安装python包
cd python-package; sudo python setup.py install
4、示例
先来看生成的决策树:
#修改自: https://xgboost.readthedocs.io/en/latest/get_started/index.html from __future__ import absolute_import from __future__ import division from __future__ import print_function import xgboost as xgb from sklearn.metrics import roc_auc_score xgFolder='/home/XXX/tools/xgboost/' # read in data dtrain = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.train') # 训练文件第一行内容为:1 3:1 10:1 11:1 21:1 30:1 34:1 36:1 40:1 41:1 53:1 58:1 65:1 69:1 77:1 86:1 88:1 92:1 95:1 102:1 105:1 117:1 124:1 # 第一个表示标签为1, 第3个特征为1, 第10个特征为1, 。。。 weights=dtrain.get_weight()# 权重矩阵,类型是numpy.ndarray,, 但是不是指的读入的数据,而是每个sample的权重,不设置就为[] labels=dtrain.get_label()# 标签,类型是numpy.ndarray print(dtrain.get_base_margin()) print(weights) print(labels[0]) dtest = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.test') # specify parameters via map # 调参:https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html # 参数详细介绍:https://xgboost.readthedocs.io/en/latest/parameter.html booster='dart' # booster='gbtree' # booster='gblinear' param = {'max_depth':3, 'eta':1, 'silent':0, 'objective':'binary:logistic','booster':booster } num_round = 2 bst = xgb.train(param, dtrain, num_round) # make prediction preds = bst.predict(dtest) print('AUC: %.4f'% roc_auc_score(dtest.get_label(), preds)) print('DONE') ####################################################### # https://xgboost.readthedocs.io/en/latest/python/python_intro.html # 绘制特征的重要性和决策树: import matplotlib.pyplot as plt ax=xgb.plot_importance(bst) plt.show() #没有这句只有debug模式才会显示。。。 # ax=xgb.plot_tree(bst, num_trees=1) ax=xgb.plot_tree(bst) plt.show() #存储决策树到图像 import codecs f=codecs.open('xgb_tree.png', mode='wb') g=xgb.to_graphviz(bst) f.write(g.pipe('png')); f.close()
输出(仅结果):
- AUC: 1.0000
- DONE
5、有用的资料
python API: http://xgboost.readthedocs.io/en/latest/python/index.html
调参: https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
参数详细介绍: https://xgboost.readthedocs.io/en/latest/parameter.html
boosted trees 简介: https://xgboost.readthedocs.io/en/latest/tutorials/index.html
Awesome XGBoost: https://github.com/dmlc/xgboost/tree/master/demo
使用C、C++API:
http://stackoverflow.com/questions/36071672/using-xgboost-in-c
http://qsalg.com/?p=388
http://stackoverflow.duapp.com/questions/35289674/create-xgboost-dmatrix-in-c/37416279