Machine Learning: integrated algorithm - bagging, boosting, adaboost

Different classification algorithms have advantages and disadvantages, different classifiers can be combined
in groups called integrated process (ensemble method) or Element Method (meta-algorithm)


using the integrated process many forms
 ○ algorithms may be different integration
 ○ same algorithm can be integrated at different settings
 ○ may be integrated after the different portions of data sets assigned to different classifiers

bagging (Bootstrap aggregating, guided aggregation algorithm, also known as bagging algorithm)

  • Random Sampling : Data in a concentration of m samples, one sample each acquisition, and then back into the sample, were collected m samples form a new data set, so that the original data set will be repeated some samples collected, some samples is not equal to the acquisition of new data set is formed and the original data set size, the theoretically sufficient when the average m of about 36% of the samples will not be collected
  • Was repeated S times to obtain new data set S
  • Will be a learning algorithm were applied to each data set to get the S classifiers
  • When new data classification, application of these S classifiers to classify, select the voting results in most categories of the classifier as the final classification results


Random Forests (random forest, RF)

Common bagging and similar, made on the basis of its little improvement
 ○ CART decision tree used as the S weak learners
 ○ assume a number of sample characteristics, CART trees are generated each time a random selection of k wherein

boosting (boosting algorithm)

  • Bagging and similar, except that each bagging trained classifiers are independent, and each boosting classifier is trained according to the performance of the trained classifier
  • boosting is to get the new classifier focus on those data have been wrongly classified by centralizing control points
  • Classification is based on the result of boosting a weighted summation of all classifiers, and bagging the classifier weights are equal in weight
  • boosting 公式
      
      \(\large f(x)= \sum_{i=1}^{n}(a_{i}f_{i}(x))\)
      
  • boosting is a serial process, good parallelism, which is one of its disadvantages
  • There are a variety of algorithms such as boosting adaboost, GBDT, xgboost


adaboost (adaptive boosting - Adaptive Boosting)

adaboost may be applied to any classifier, the classifier is transformed into long as capable of processing data can be weighted

Its operation is as follows

  • Given a weight vector composed of weights for each sample D , the beginning of D is initialized to a value equal to: \ (\ Large \ FRAC. 1} {} {the number of samples \)
  • First trained on the training data of a weak classifier \ (\ large f (x) \) and in accordance with the sample weight value D calculated classifier error rate

      \ (\ large e = \ sum D_ {i} \)     where \ (\ large i \) for the classification error data

  • Each classifier has a weight value, which value is calculated based on the error rate of each weak classifier, the formula

      \ (\ large \ alpha = \ frac {1} {2} ln (\ frac {1-e} {e}) \)

  • Here there must be \ (\ large e <0.5 \ ) to ensure \ (\ Large \ Alpha> 0 \) , where the classifier should not be more than half the error occurs, or should be ignored
  • Then on the same data set of weak classifiers trained again, this will adjust the weight of each sample weight, the weight will be reduced to the right to the last minute of the sample, right misclassified sample weight will increase, while based on classifiers weight \ (\ small \ alpha \) is calculated

      correctly classified the
        \ (\ Large D_ {i- new} = \ frac {D_ {i-old} \ e ^ {(- \ alpha)}} {\ sum_ {j = 1} ^ {n} D_

    {j-old} \ e ^ {(- \ alpha)}} \)   misclassified the
        \ (\ Large D_ {i- new} = \ frac {D_ {i-old} \ e ^ {(\ alpha)}
      
    } {\ sum_ {j = 1} ^ {n} D_ {j-old} \ e ^ {(\ alpha)}} \)   classification result was marked as 1 and -1, so unified written as

        \ (\ Large D_ {i- new} = \ frac {D_ {i-old} \ e ^ {(- \ alpha y_ {i} f (x_ {i}))}} {\ sum_ {j = 1} ^ {n} D_ {j-old} \ e ^ {(- \ alpha y_ {j} f (x_ {j})))}} \)

  • Accumulation classification result

      \ (\ Large y = sign ( \ sum_ {i = 1} ^ {n} (\ alpha_ {i} f_ {i} (x))) \)

  • Iteration continues until the total number of a specified value until the error rate is zero or weak classifiers to reach a user

      \ (\ Large overall error rate = \ frac {accumulation result of the error classification total sample number} {} \)


adaboost Code

# coding=utf-8
import numpy as np


def stumpClassify(dataMatrix, dimen, threshVal, threshIneq):
    """
    单层决策树 (decision stump,也称决策树桩)
        它仅基于单个特征来做决策,由于只有一次分裂过程,实际上就是一个树桩
        单层决策树的分类能力比较弱,是一种弱分类器,通过 adaboost 使用多个单层决策树可以构建强分类器

    dataMatrix - 要分类的数据集 (n, m) 矩阵
    dimen -      用于分类的特征
    threshVal -  判断分类的阀值
    threshIneq - 操作符 ('lt', 'gt') 决定是特征值大于阀值返回分类 -1,还是小于阀值返回分类 -1
    """

    # 初始化分类矩阵,默认为分类 1
    retArray = np.ones((np.shape(dataMatrix)[0], 1))

    if threshIneq == 'lt':
        # 当 dataMatrix[x, dimen] <= threshVal 时,将 retArray[x] 改为 -1
        retArray[dataMatrix[:, dimen] <= threshVal] = -1.0
    else:
        retArray[dataMatrix[:, dimen] > threshVal] = -1.0

    return retArray


def buildStump(dataArr, classLabels, D):
    """
    按照样本权值,寻找最佳的单层决策树,即寻找最佳的分类特征和分类阀值,以及操作符

    dataArr -     样本数据
    classLabels - 标签数据
    D -           样本权值
    """

    # 初始化矩阵并获取矩阵大小
    dataMatrix = np.mat(dataArr)
    labelMat = np.mat(classLabels).T
    n, m = np.shape(dataMatrix)

    # 阀值数
    # 将特征值从最大值到最小值之间,取 10 个间隔分出 11 个阀值,在这些阀值中选取最佳值
    numSteps = 10.0

    # 用于存储最佳决策树的配置,包括(特征,阀值,操作符)
    bestStump = {}

    # 用于保存最佳决策树的分类结果
    bestClasEst = np.mat(np.zeros((n, 1)))

    # 用于保存最佳决策树的错误率
    minError = np.inf

    # 遍历每一个特征
    for i in range(m):
        # 取该特征的最大最小值以及步长
        rangeMin = dataMatrix[:, i].min()
        rangeMax = dataMatrix[:, i].max()
        stepSize = (rangeMax - rangeMin)/numSteps

        # 遍历所有阀值
        for j in range(0, int(numSteps) + 1):

            # 遍历操作符
            for inequal in ['lt', 'gt']:
                # 取阀值
                threshVal = (rangeMin + float(j) * stepSize)

                # 以 (特征,阀值,操作符) 作为决策树,对所有数据分类
                predictedVals = stumpClassify(dataMatrix, i, threshVal, inequal)

                # 评估分类结果,正确分类为 1,错误分类为 0
                errArr = np.mat(np.ones((n, 1)))
                errArr[predictedVals == labelMat] = 0

                # 计算错误率, D 的初始值是 1/(样本总数)
                weightedError = D.T*errArr
                if weightedError < minError:
                    # 找到更好的决策树,保存结果
                    minError = weightedError
                    bestClasEst = predictedVals.copy()
                    bestStump['dim'] = i
                    bestStump['thresh'] = threshVal
                    bestStump['ineq'] = inequal

    # 返回最佳决策树配置(特征,阀值,操作符), 最佳决策树的错误率, 最佳决策树的分类结果
    return bestStump, minError, bestClasEst


def adaBoostTrainDS(dataArr, classLabels, numIt = 40):
    """
    基于单层决策树的 adaboost 训练

    dataArr -     样本数据
    classLabels - 样本标签
    numIt -       最大迭代次数
    """

    # 用于保存决策树列表
    # 每次迭代会产生一个决策树, 直到达到最大迭代次数, 或是最终错误率为 0
    weakClassArr = []

    # 样本数
    n = np.shape(dataArr)[0]

    # 初始化样本权值 D,每个数据的权重为 1/(样本数)
    D = np.mat(np.ones((n, 1))/n)

    # 保存累加分类结果
    aggClassEst = np.mat(np.zeros((n, 1)))

    for i in range(numIt):
        # 按样本和权值寻找最佳决策树
        # 返回决策树配置(特征,阀值,操作符), 错误率, 分类结果
        bestStump, error, classEst = buildStump(dataArr, classLabels, D)

        # 计算决策树权值 alpha = 0.5 * ln((1-err)/err)
        # 1e-16 是防止 err 为 0 的情况, ln(1/1e-16) 的结果为 36.8
        # 这里没处理 err > 0.5 导致 alpha < 0 的情况, 照理说也不应该出现这种情况
        alpha = float(0.5 * np.log((1.0 - error)/max(error, 1e-16)))

        # 将决策树权值加入到决策树配置
        bestStump['alpha'] = alpha

        # 将决策树配置加入决策树列表
        weakClassArr.append(bestStump)

        # 计算新的样本权值
        # D(i_new) = (D(i_old) * exp(-alpha * yi * f(xi))) / SUM_j_1_n (D(j_old) * exp(-alpha * yj * f(xj)))
        expon = np.multiply(-1 * alpha * np.mat(classLabels).T, classEst)
        D = np.multiply(D, np.exp(expon))
        D = D/D.sum()

        # 该决策树的分类结果, 按权值算入累加分类结果
        aggClassEst += alpha*classEst

        # 计算累加分类结果的错误率, 如果错误率为 0 则退出迭代
        aggErrors = np.multiply(np.sign(aggClassEst) != np.mat(classLabels).T, np.ones((n, 1)))
        errorRate = aggErrors.sum()/n
        if errorRate == 0.0:
            break

    # 返回决策树配置列表, 累加分类结果
    return weakClassArr, aggClassEst


def adaClassify(datToClass, classifierArr):
    """
    使用决策树列表进行分类

    weakClassArr -  要分类的数据
    classifierArr - 决策树配置列表
    """

    dataMatrix = np.mat(datToClass)
    n = np.shape(dataMatrix)[0]
    aggClassEst = np.mat(np.zeros((n, 1)))

    # 遍历决策树
    for i in range(len(classifierArr)):
        # 分类
        classEst = stumpClassify(dataMatrix,
                                 classifierArr[i]['dim'],
                                 classifierArr[i]['thresh'],
                                 classifierArr[i]['ineq'])

        # 按权值累加分类结果
        aggClassEst += classifierArr[i]['alpha']*classEst

    # sign 函数:大于 0 返回 1,小于 0 返回 -1,等于 0 返回 0
    return np.sign(aggClassEst)
    
    




Guess you like

Origin www.cnblogs.com/moonlight-lin/p/12384906.html