xgboost 安装、绘图笔记

系统:ubuntu 16.04

当前文档很不错了: https://xgboost.readthedocs.io/en/latest/build.html

1、下载源码
一行命令搞定,下载的源码在当前文件夹下,会创建一个xgboost目录
git clone --recursive https://github.com/dmlc/xgboost


2、编译so
cd xgboost; make -j4


3、安装python包
cd python-package; sudo python setup.py install



4、示例
先来看生成的决策树:


#修改自: https://xgboost.readthedocs.io/en/latest/get_started/index.html

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import xgboost as xgb
from sklearn.metrics import roc_auc_score

xgFolder='/home/XXX/tools/xgboost/'

# read in data
dtrain = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.train')
# 训练文件第一行内容为:1 3:1 10:1 11:1 21:1 30:1 34:1 36:1 40:1 41:1 53:1 58:1 65:1 69:1 77:1 86:1 88:1 92:1 95:1 102:1 105:1 117:1 124:1
# 第一个表示标签为1,  第3个特征为1, 第10个特征为1, 。。。

weights=dtrain.get_weight()# 权重矩阵,类型是numpy.ndarray,, 但是不是指的读入的数据,而是每个sample的权重,不设置就为[]
labels=dtrain.get_label()# 标签,类型是numpy.ndarray
print(dtrain.get_base_margin())
print(weights)
print(labels[0])
dtest = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.test')

# specify parameters via map
# 调参:https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
# 参数详细介绍:https://xgboost.readthedocs.io/en/latest/parameter.html
booster='dart'
# booster='gbtree'
# booster='gblinear'

param = {'max_depth':3, 'eta':1, 'silent':0, 'objective':'binary:logistic','booster':booster }
num_round = 2

bst = xgb.train(param, dtrain, num_round)
# make prediction
preds = bst.predict(dtest)
print('AUC: %.4f'%  roc_auc_score(dtest.get_label(), preds))
print('DONE')

#######################################################
# https://xgboost.readthedocs.io/en/latest/python/python_intro.html
# 绘制特征的重要性和决策树:
import matplotlib.pyplot as plt
ax=xgb.plot_importance(bst)
plt.show()  #没有这句只有debug模式才会显示。。。

# ax=xgb.plot_tree(bst, num_trees=1)
ax=xgb.plot_tree(bst)
plt.show()


#存储决策树到图像
import codecs
f=codecs.open('xgb_tree.png', mode='wb')
g=xgb.to_graphviz(bst)
f.write(g.pipe('png'));
f.close()


输出(仅结果):
  • AUC: 1.0000
  • DONE


5、有用的资料
python API: http://xgboost.readthedocs.io/en/latest/python/index.html

调参: https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
参数详细介绍: https://xgboost.readthedocs.io/en/latest/parameter.html

boosted trees 简介: https://xgboost.readthedocs.io/en/latest/tutorials/index.html

Awesome XGBoost: https://github.com/dmlc/xgboost/tree/master/demo


使用C、C++API:
http://stackoverflow.com/questions/36071672/using-xgboost-in-c
http://qsalg.com/?p=388
http://stackoverflow.duapp.com/questions/35289674/create-xgboost-dmatrix-in-c/37416279

猜你喜欢

转载自cherishlc.iteye.com/blog/2329604