机器学习各个算法4--boosting,adaboost

最大公共子序列动态规划算法可参考http://blog.csdn.net/HerosOfEarth/article/category/6364820

基于单层决策树构建弱分类器

相关函数

def adaBoostTrainDS(dataArr,classLabels,numIt=40):   #基于单层决策树的AdaBoost训练过程
    weakClassArr = []   #弱分类器数组
    m = shape(dataArr)[0]  #指代样本个数
    D = mat(ones((m,1))/m)   #init D to all equal  权重向量的值初始化为相等。
    aggClassEst = mat(zeros((m,1)))    #记录每个数据点的类别估计累计值,初始化为0
    for i in range(numIt):   #开始迭代进程
        bestStump,error,classEst = buildStump(dataArr,classLabels,D)#build Stump
        print "D:",D.T
        alpha = float(0.5*log((1.0-error)/max(error,1e-16)))#calc alpha, throw in max(error,eps) to account for error=0 防止出现错误率为0的情况
        bestStump['alpha'] = alpha  
        weakClassArr.append(bestStump)                  #store Stump Params in Array 字典存储到若分类器列表中
        print "classEst: ",classEst.T
        expon = multiply(-1*alpha*mat(classLabels).T,classEst) #exponent for D calc, getting messy
        D = multiply(D,exp(expon))                              #Calc New D for next iteration
        D = D/D.sum()
        #calc training error of all classifiers, if this is 0 quit for loop early (use break)
        aggClassEst += alpha*classEst
        print "aggClassEst: ",aggClassEst.T
        aggErrors = multiply(sign(aggClassEst) != mat(classLabels).T,ones((m,1)))   #python sign()函数
        errorRate = aggErrors.sum()/m     #得到错误率
        print "total error: ",errorRate
        if errorRate == 0.0: break
    return weakClassArr,aggClassEst

测试函数

#AdaBoost算法的训练
classifierArray = adaboost.adaBoostTrainDS(dataMat,classLabels,numIt=40)
print classifierArray

结果

split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.600
split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.600
split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.600
split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.400
split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.600
split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.200
split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.800
split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.600
split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.400
split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.200
split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.800
split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.400
split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.600
split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.600
split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.400
D: [[ 0.2 0.2 0.2 0.2 0.2]]
classEst: [[-1. 1. -1. -1. 1.]]
aggClassEst: [[-0.69314718 0.69314718 -0.69314718 -0.69314718 0.69314718]]
total error: 0.2
split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.250
split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.750
split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.625
split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.375
split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.625
split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.375
split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.625
split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.375
split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.500
split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.500
split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.750
split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.250
split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.125
split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.875
split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.250
split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.750
split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.750
split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.250
D: [[ 0.5 0.125 0.125 0.125 0.125]]
classEst: [[ 1. 1. -1. -1. -1.]]
aggClassEst: [[ 0.27980789 1.66610226 -1.66610226 -1.66610226 -0.27980789]]
total error: 0.2
split: dim 0, thresh 0.90, thresh ineqal: lt, the weighted error is 0.143
split: dim 0, thresh 0.90, thresh ineqal: gt, the weighted error is 0.857
split: dim 0, thresh 1.00, thresh ineqal: lt, the weighted error is 0.357
split: dim 0, thresh 1.00, thresh ineqal: gt, the weighted error is 0.643
split: dim 0, thresh 1.10, thresh ineqal: lt, the weighted error is 0.357
split: dim 0, thresh 1.10, thresh ineqal: gt, the weighted error is 0.643
split: dim 0, thresh 1.20, thresh ineqal: lt, the weighted error is 0.357
split: dim 0, thresh 1.20, thresh ineqal: gt, the weighted error is 0.643
split: dim 0, thresh 1.30, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.30, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.40, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.40, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.50, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.50, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.60, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.60, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.70, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.70, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.80, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.80, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 1.90, thresh ineqal: lt, the weighted error is 0.286
split: dim 0, thresh 1.90, thresh ineqal: gt, the weighted error is 0.714
split: dim 0, thresh 2.00, thresh ineqal: lt, the weighted error is 0.857
split: dim 0, thresh 2.00, thresh ineqal: gt, the weighted error is 0.143
split: dim 1, thresh 0.89, thresh ineqal: lt, the weighted error is 0.143
split: dim 1, thresh 0.89, thresh ineqal: gt, the weighted error is 0.857
split: dim 1, thresh 1.00, thresh ineqal: lt, the weighted error is 0.500
split: dim 1, thresh 1.00, thresh ineqal: gt, the weighted error is 0.500
split: dim 1, thresh 1.11, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.11, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.22, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.22, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.33, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.33, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.44, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.44, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.55, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.55, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.66, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.66, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.77, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.77, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.88, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.88, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 1.99, thresh ineqal: lt, the weighted error is 0.571
split: dim 1, thresh 1.99, thresh ineqal: gt, the weighted error is 0.429
split: dim 1, thresh 2.10, thresh ineqal: lt, the weighted error is 0.857
split: dim 1, thresh 2.10, thresh ineqal: gt, the weighted error is 0.143
D: [[ 0.28571429 0.07142857 0.07142857 0.07142857 0.5 ]]
classEst: [[ 1. 1. 1. 1. 1.]]
aggClassEst: [[ 1.17568763 2.56198199 -0.77022252 -0.77022252 0.61607184]]
total error: 0.0

([{'dim': 0, 'ineq': 'lt', 'thresh': 1.3, 'alpha': 0.6931471805599453}, {'dim': 1, 'ineq': 'lt', 'thresh': 1.0, 'alpha': 0.9729550745276565}, {'dim': 0, 'ineq': 'lt', 'thresh': 0.90000000000000002, 'alpha': 0.8958797346140273}], matrix([[ 1.17568763],
[ 2.56198199],
[-0.77022252],
[-0.77022252],
[ 0.61607184]]))

基于AdaBoost的分类

是

机器学习各个算法4--boosting,adaboost

猜你喜欢