2019.6.9svm of smo algorithm

SMO efficient optimization algorithm
SVM realize there are many, the most popular implementation is: sequential minimal optimization (Sequential Minimal Optimization, SMO) algorithm.
The following description will be called a kernel (Kernel) manner SVM extended to more data sets.
Note: SVM intuitive geometric meaning, but its algorithm is complex, it involves a large number of mathematical formulas are derived.
Sequential Minimal Optimization (Sequential Minimal Optimization, SMO)

Creating Author: John Platt
Created: in 1996
SMO purposes: to train SVM
SMO goal: find a series of alpha and b, once obtained alpha, it is easy to calculate the weight vector w and get separated hyperplane.
SMO thought: is the big optimization problem into a number of small to solve the optimization problem.
SMO Principle: Each cycle through two alpha optimization processing, once to find a suitable pair of alpha, then increased a reduction of one.
Suitable means here must meet certain conditions
that must be outside the two alpha interval boundaries
both alpha section has not been or is not on the boundary processing.
The reason for Alpha 2 while changing; the reason is that we have a constraint: (\ sum_ {i = 1 } ^ {m} a_i · label_i = 0); if only a modified alpha, is likely to lead to failure of constraints.
SMO pseudo-code as follows:
Create an alpha vector and vector initialized to 0
when the number of iterations is less than the maximum number of iterations (outer loop)
in the dataset for each data vector (inner loop):
if the data vector can be optimized
stochastic selecting a data vector further
optimize these two vectors
if the two vectors is not to be optimized, the inner loop is exited
if all vectors have not been optimized, the number of iterations increases, continue to the next cycle
of SVM features
advantages: generalization (by the particular individual expansion general, that is: the new model training after the sample) error rate, computationally inexpensive, easy to understand the results.
Disadvantages: the parameter selection and adjustment sensitive kernel function, without modifying the original classification is only suitable for processing binary classification.
Data type: type of data and the nominal numeric
text file format:

-1 3.542485 1.977398
3.018896 2.556416 -1
7.551510 -1.580030 1
2.114999 -0.004466 -1
8.127113 1.274372 1
Preparation data

loadDataSet DEF (fileName):
"" "
the file parsed line by line to obtain the category label row of the entire feature matrix and
the Args:
fileName file name
Returns:
Datamat characteristic matrix
labelMat based label
" ""
Datamat = []
labelMat = [ ]
fr = Open (fileName)
for Line in fr.readlines ():
. lineArr = line.strip () Split ( '\ T')
dataMat.append ([a float (lineArr [0]), a float (lineArr [. 1] )])
labelMat.append (a float (lineArr [2]))
return Datamat, labelMat
analysis: None

Training algorithm

def smoSimple(dataMatIn, classLabels, C, toler, maxIter):
“”"smoSimple

Args:
    dataMatIn    特征集合
    classLabels  类别标签
    C   松弛变量(常量值),允许有些数据点可以处于分隔面的错误一侧。
        控制最大化间隔和保证大部分的函数间隔小于1.0这两个目标的权重。
        可以通过调节该参数达到不同的结果。
    toler   容错率(是指在某个体系中能减小一些因素或选择对某个系统产生不稳定的概率。)
    maxIter 退出前最大的循环次数
Returns:
    b       模型的常量值
    alphas  拉格朗日乘子
"""
dataMatrix = mat(dataMatIn)
# 矩阵转置 和 .T 一样的功能
labelMat = mat(classLabels).transpose()
m, n = shape(dataMatrix)

# 初始化 b和alphas(alpha有点类似权重值。)
b = 0
alphas = mat(zeros((m, 1)))

# 没有任何alpha改变的情况下遍历数据的次数
iter = 0
while (iter < maxIter):
    # w = calcWs(alphas, dataMatIn, classLabels)
    # print("w:", w)

    # 记录alpha是否已经进行优化,每次循环时设为0,然后再对整个集合顺序遍历
    alphaPairsChanged = 0
    for i in range(m):
        # print 'alphas=', alphas
        # print 'labelMat=', labelMat
        # print 'multiply(alphas, labelMat)=', multiply(alphas, labelMat)
        # 我们预测的类别 y[i] = w^Tx[i]+b; 其中因为 w = Σ(1~n) a[n]*label[n]*x[n]
        fXi = float(multiply(alphas, labelMat).T*(dataMatrix*dataMatrix[i, :].T)) + b
        # 预测结果与真实结果比对,计算误差Ei
        Ei = fXi - float(labelMat[i])

        # 约束条件 (KKT条件是解决最优化问题的时用到的一种方法。我们这里提到的最优化问题通常是指对于给定的某一函数,求其在指定作用域上的全局最小值)
        # 0<=alphas[i]<=C,但由于0和C是边界值,我们无法进行优化,因为需要增加一个alphas和降低一个alphas。
        # 表示发生错误的概率:labelMat[i]*Ei 如果超出了 toler, 才需要优化。至于正负号,我们考虑绝对值就对了。
        '''
        # 检验训练样本(xi, yi)是否满足KKT条件
        yi*f(i) >= 1 and alpha = 0 (outside the boundary)
        yi*f(i) == 1 and 0<alpha< C (on the boundary)
        yi*f(i) <= 1 and alpha = C (between the boundary)
        '''
        if ((labelMat[i]*Ei < -toler) and (alphas[i] < C)) or ((labelMat[i]*Ei > toler) and (alphas[i] > 0)):

            # 如果满足优化的条件,我们就随机选取非i的一个点,进行优化比较
            j = selectJrand(i, m)
            # 预测j的结果
            fXj = float(multiply(alphas, labelMat).T*(dataMatrix*dataMatrix[j, :].T)) + b
            Ej = fXj - float(labelMat[j])
            alphaIold = alphas[i].copy()
            alphaJold = alphas[j].copy()

            # L和H用于将alphas[j]调整到0-C之间。如果L==H,就不做任何改变,直接执行continue语句
            # labelMat[i] != labelMat[j] 表示异侧,就相减,否则是同侧,就相加。
            if (labelMat[i] != labelMat[j]):
                L = max(0, alphas[j] - alphas[i])
                H = min(C, C + alphas[j] - alphas[i])
            else:
                L = max(0, alphas[j] + alphas[i] - C)
                H = min(C, alphas[j] + alphas[i])
            # 如果相同,就没法优化了
            if L == H:
                print("L==H")
                continue

            # eta是alphas[j]的最优修改量,如果eta==0,需要退出for循环的当前迭代过程
            # 参考《统计学习方法》李航-P125~P128<序列最小最优化算法>
            eta = 2.0 * dataMatrix[i, :]*dataMatrix[j, :].T - dataMatrix[i, :]*dataMatrix[i, :].T - dataMatrix[j, :]*dataMatrix[j, :].T
            if eta >= 0:
                print("eta>=0")
                continue

            # 计算出一个新的alphas[j]值
            alphas[j] -= labelMat[j]*(Ei - Ej)/eta
            # 并使用辅助函数,以及L和H对其进行调整
            alphas[j] = clipAlpha(alphas[j], H, L)
            # 检查alpha[j]是否只是轻微的改变,如果是的话,就退出for循环。
            if (abs(alphas[j] - alphaJold) < 0.00001):
                print("j not moving enough")
                continue
            # 然后alphas[i]和alphas[j]同样进行改变,虽然改变的大小一样,但是改变的方向正好相反
            alphas[i] += labelMat[j]*labelMat[i]*(alphaJold - alphas[j])
            # 在对alpha[i], alpha[j] 进行优化之后,给这两个alpha值设置一个常数b。
            # w= Σ[1~n] ai*yi*xi => b = yj- Σ[1~n] ai*yi(xi*xj)
            # 所以:  b1 - b = (y1-y) - Σ[1~n] yi*(a1-a)*(xi*x1)
            # 为什么减2遍? 因为是 减去Σ[1~n],正好2个变量i和j,所以减2遍
            b1 = b - Ei- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i, :]*dataMatrix[i, :].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[i, :]*dataMatrix[j, :].T
            b2 = b - Ej- labelMat[i]*(alphas[i]-alphaIold)*dataMatrix[i, :]*dataMatrix[j, :].T - labelMat[j]*(alphas[j]-alphaJold)*dataMatrix[j, :]*dataMatrix[j, :].T
            if (0 < alphas[i]) and (C > alphas[i]):
                b = b1
            elif (0 < alphas[j]) and (C > alphas[j]):
                b = b2
            else:
                b = (b1 + b2)/2.0
            alphaPairsChanged += 1
            print("iter: %d i:%d, pairs changed %d" % (iter, i, alphaPairsChanged))
    # 在for循环外,检查alpha值是否做了更新,如果更新则将iter设为0后继续运行程序
    # 直到更新完毕后,iter次循环无变化,才退出循环。
    if (alphaPairsChanged == 0):
        iter += 1
    else:
        iter = 0
    print("iteration number: %d" % iter)
return b, alphas

Guess you like

Origin blog.csdn.net/weixin_43732462/article/details/91355604