Machine Learning (using adaboost meta-algorithms to improve classification performance)

The idea behind meta-algorithms is a way of combining other algorithms, A

from numpy import *

def loadSimpData():
    datMat = matrix([[ 1. ,  2.1],
        [ 2. ,  1.1],
        [ 1.3,  1. ],
        [ 1. ,  1. ],
        [ 2. ,  1. ]])
    classLabels = [1.0, 1.0, -1.0, -1.0, 1.0]
    return datMat,classLabels

def loadDataSet(fileName):      #general function to parse tab -delimited floats
    numFeat = len(open(fileName).readline().split('\t')) #get number of fields 
    dataMat = []; labelMat = []
    fr = open(fileName)
    for line in fr.readlines():
        lineArr =[]
        curLine = line.strip().split('\t')
        for i in range(numFeat-1):
            lineArr.append(float(curLine[i]))
        dataMat.append(lineArr)
        labelMat.append(float(curLine[-1]))
    return dataMat,labelMat

def stumpClassify(dataMatrix,dimen,threshVal,threshIneq):#just classify the data
    retArray = ones((shape(dataMatrix)[0],1))
    if threshIneq == 'lt':
        retArray[dataMatrix[:,dimen] <= threshVal] = -1.0
    else:
        retArray[dataMatrix[:,dimen] > threshVal] = -1.0
    return retArray
    

def buildStump(dataArr,classLabels,D):
    dataMatrix = mat(dataArr); labelMat = mat(classLabels).T
    m,n = shape(dataMatrix)
    numSteps = 10.0; bestStump = {}; bestClasEst = mat(zeros((m,1)))
    minError = inf #init error sum, to +infinity
    for i in range(n):#loop over all dimensions
        rangeMin = dataMatrix[:,i].min(); rangeMax = dataMatrix[:,i].max();
        stepSize = (rangeMax-rangeMin)/numSteps
        for j in range(-1,int(numSteps)+1):#loop over all range in current dimension
            for inequal in ['lt', 'gt']: #go over less than and greater than
                threshVal = (rangeMin + float(j) * stepSize)
                predictedVals = stumpClassify(dataMatrix,i,threshVal,inequal)#call stump classify with i, j, lessThan
                errArr = mat(ones((m,1)))
                errArr[predictedVals == labelMat] = 0
                weightedError = D.T*errArr  #calc total error multiplied by D
                #print "split: dim %d, thresh %.2f, thresh ineqal: %s, the weighted error is %.3f" % (i, threshVal, inequal, weightedError)
                if weightedError < minError:
                    minError = weightedError
                    bestClasEst = predictedVals.copy()
                    bestStump[ ' dim ' ] = i
                    bestStump['thresh'] = threshVal
                    bestStump[ ' ineq ' ] = inequal
     return bestStump,minError,bestClassEst


def adaBoostTrainDS(dataArr,classLabels,numIt=40):
    weakClassArr = []
    m = shape(dataArr)[0]
    D = mat(ones((m,1))/m)   #init D to all equal
    aggClassEst = mat(zeros((m,1)))
    for i in range(numIt):
        bestStump,error,classEst = buildStump(dataArr,classLabels,D)#build Stump
        #print "D:",D.T
        alpha = float(0.5*log((1.0-error)/max(error,1e-16)))#calc alpha, throw in max(error,eps) to account for error=0
        bestStump['alpha'] = alpha  
        weakClassArr.append(bestStump)                  #store Stump Params in Array
        #print "classEst: ",classEst.T
        expon = multiply(-1*alpha*mat(classLabels).T,classEst) #exponent for D calc, getting messy
        D = multiply(D,exp(expon))                              #Calc New D for next iteration
        D = D/D.sum()
        #calc training error of all classifiers, if this is 0 quit for loop early (use break)
        aggClassEst += alpha*classEst
        #print "aggClassEst: ",aggClassEst.T
        aggErrors = multiply(sign(aggClassEst) != mat(classLabels).T,ones((m,1)))
        errorRate = aggErrors.sum()/m
        print "total error: ",errorRate
        if errorRate == 0.0: break
    return weakClassArr,aggClassEst

def adaClassify(datToClass,classifierArr):
    dataMatrix = mat(datToClass)#do stuff similar to last aggClassEst in adaBoostTrainDS
    m = shape(dataMatrix)[0]
    aggClassEst = mat(zeros((m,1)))
    for i in range(len(classifierArr)):
        classEst = stumpClassify(dataMatrix,classifierArr[i]['dim'],\
                                 classifierArr[i]['thresh'],\
                                 classifierArr[i]['ineq'])#call stump classify
        aggClassEst += classifierArr[i]['alpha']*classEst
        print aggClassEst
    return sign(aggClassEst)

def plotROC(predStrengths, classLabels):
    import matplotlib.pyplot as plt
    cur = (1.0,1.0) #cursor
    ySum = 0.0 #variable to calculate AUC
    numPosClas = sum(array(classLabels)==1.0)
    yStep = 1/float(numPosClas); xStep = 1/float(len(classLabels)-numPosClas)
    sortedIndicies = predStrengths.argsort()#get sorted index, it's reverse
    fig = plt.figure()
    fig.clf()
    ax = plt.subplot(111)
    #loop through all the values, drawing a line segment at each point
    for index in sortedIndicies.tolist()[0]:
        if classLabels[index] == 1.0:
            delX = 0; delY = yStep;
        else:
            delX = xStep; delY = 0;
            ySum += cur[1]
        #draw line from cur to (cur[0]-delX,cur[1]-delY)
        ax.plot([cur[0],cur[0]-delX],[cur[1],cur[1]-delY], c='b')
        cur = (cur [0] -delX, cur [1] - delY)
    ax.plot([0,1],[0,1],'b--')
    plt.xlabel('False positive rate'); plt.ylabel('True positive rate')
    plt.title('ROC curve for AdaBoost horse colic detection system')
    ax.axis([0,1,0,1])
    plt.show()
    print "the Area Under the Curve is: ",ySum*xStep

 

daboost is the most popular meta-algorithm and one of the most powerful tools in machine learning

The combination method can be the combination of different algorithms, the integration of the same algorithm under different settings, or the integration of different parts of the data set after they are assigned to different classifiers

Advantages: low generalization error rate, easy to code, can be applied to most classifiers, no parameters need to be adjusted

Cons: Sensitive to outliers

Applicable to numeric data and nominal data

Bagging is a technique of obtaining S new data sets after selecting S times from the original data set. The new data set is equal in size to the original data set, and each data set is replaced by randomly selecting a sample from the original data set. A process allows the selection of repeated values, while some values ​​may not appear

After the S pieces of data are established, an algorithm is applied to each data set to obtain S classifiers. When we classify new data, we can use these S classifiers to classify and select the classifier voting results. The most results in the final classification result

The more advanced bagging method is random forest

Boosting is a technique similar to bagging, bagging is obtained through serial training, boosting is to focus on the part of the data that has been misclassified by existing classifiers to obtain new classifiers

The result of boosting is the result of the weighted summation of all classifiers. Bagging has equal weights, and boosting weights are different. Each weight represents the success of the classifier in the previous iteration.

Adaboost is a kind of boosting

The Adaboost algorithm can be briefly described as three steps:
 (1) First, initialize the weight distribution D1 of the training data. Assuming that there are N training sample data, each training sample is given the same weight at the beginning: w1=1/N.
 (2) Then, train the weak classifier hi. The specific training process is: if a training sample point is accurately classified by the weak classifier hi, its corresponding weight should be reduced in the construction of the next training set; on the contrary, if a training sample point is misclassified , then its weight should increase. The sample set with updated weights is used to train the next classifier, and the whole training process goes on iteratively.
 (3) Finally, combine the weak classifiers obtained from each training into a strong classifier. After the training process of each weak classifier is completed, increase the weight of the weak classifier with a small classification error rate, so that it plays a larger decisive role in the final classification function, and reduce the weight of the weak classifier with a large classification error rate. weights so that they play a lesser decisive role in the final classification function.
  In other words, a weak classifier with a low error rate will have a larger weight in the final classifier, otherwise it will be smaller.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325216073&siteId=291194637