This article is from the Sync Blog .
PS I don't know how Jianshu can display mathematical formulas and better typesetting content. So if you feel that the format below the article is messy, please jump to the above link by yourself. In the future, I will no longer take screenshots of mathematical formulas. After all, the layout of inline formula screenshots will be messy. Looking at the original blog address will have a better experience.
The previous article introduced the basic principles of support vector machines in machine learning, and at the end of the article introduced a Python
method for solving the extremum of a binomial programming problem. In this article, I will use this method to solve the $-\vec{\alpha}-$, $-\vec{w}-$, $-b-$ mentioned above step by step, so as to review and verify the support Knowledge points of vector machines.
data
Let's look at a set of test data:
data = {
'+': [
[1, 7],
[2, 8],
[3, 8],
[2, 6.5]
],
'-': [
[5, 1],
[6, -1],
[7, 3]
]
}
Data data
is data that has two types that have been classified, and the elements of each type of data are two-dimensional vectors that can be represented in a Cartesian coordinate system.
According to the principle described in the previous article, we need to use these data to solve a $-\vec{\alpha}-$ vector. That is, we need to solve the $-\vec{\alpha}-$ vector that minimizes the value of the binomial programming equation:
$$
F(\vec{\alpha}) = \frac{1}{2}\vec{\ alpha}_{T}H\vec{\alpha} + \vec{c}\vec{\alpha} + c_0, \vec{y}^{T}\vec{\alpha} = 0, \vec{\ alpha} \ge 0
$$
Obviously, in SVM, $-c_0 = 0-$.
Parametric Solver
data
First prepare the variables $-H, c, c_0-$ appearing in the above equation using the input test . Refer to the following code:
def parseXYC(d):
X = []
y = []
c = []
for _, v in enumerate(d['+']): X.append(np.array(v)) y.append(1) c.append(-1) for _, v in enumerate(d['-']): X.append(np.array(v)) y.append(-1) c.append(-1) return X, y, c, 0 X, y, c, c0 = parseXYC(data)
parseXYC
The function data
formats it as $-X, y, c, c_0-$.
Then calculate the value of the $-H-$ matrix. Relatively simple, one line of code can get:
H = np.array([y[i] * y[j] * np.dot(X[i], X[j]) for i in range(len(X)) for j in range(len(X))]).reshape(len(X), len(X))
Solve for $-\vec{\alpha}-$
All the data is ready, the next step is to bring it into the optimize.minimize
function to calculate the result.
Here are a few difficulties that are beyond the scope of this article to briefly mention:
optimize.minimize
The method used by the function to solve the binomial programSLSQP
requires both the Jacobian derivative of the binomial equation and the Jacobian derivative of the constraint function. Not knowing this has caused me to be unable to solve for the correct value during testing.- Inequality constraints $-\vec{\alpha} \ge 0-$ cannot be
constraints
passed to aoptimize.minimize
function as a constraint parameter. I'm guessing it's because I've constructed the inequality parameters wrong, so I can't get the inequality constraints to work. I haven't been able to solve this problem yet, I hope students who understand this problem can leave a message to enlighten me. As a remedy, Ibounds
describe the inequality $-\vec{\alpha} \ge 0-$ using the boundary constraint parameters. - In the solved $-\vec{\alpha}-$ vector, some elements that should be 0 cannot be completely accurate to 0. I observed that the accuracy of the test results should be 1e-16, so I assume that the value under the accuracy of the negative 16th power is 0. After drawing verification, I found this assumption to be reasonable.
The following code is implemented:
# 定义二项规划方程fun及其雅各比方程jac
def fun(x, sign=1.): return sign * (0.5 * np.dot(x.T, np.dot(H, x))+ np.dot(c, x) + c0) def jac(x, sign=1.): return sign * (np.dot(x.T, H) + c) # 定义等式约束条件方程feq及其雅各比方程jeq def feq(x): return np.dot(y, x) def jeq(x): return np.array(y) # 生成相关参数 diff = 1e-16 bounds = [(0, None) for _ in range(len(y))] # x >= 0 constraints = [{ 'type': 'eq', 'fun': feq, 'jac': jeq }]# y*x = 0 options = { 'ftol': diff, 'disp': True } guess = np.array([0 for _ in range(len(X))]) # 计算结果 res_cons = optimize.minimize(fun, guess, method='SLSQP', jac=jac, bounds=bounds, constraints=constraints, options=options) alpha = [ 0 if abs(x - 0) <= diff else x for x in res_cons.x ] # 输出结果与校验y*alpha的值是否为0 print('raw alpha: ', res_cons.x) print('fmt alpha: ', alpha) print('check y*alpha: ', 'is 0'if (abs(np.dot(y, res_cons.x) - 0) < diff ) else 'is not 0')
Solve $-\vec{w}-$ and $-b-$
# 计算w = sum(xi*yi*Xi)
w = np.sum([ np.array([0, 0]) if alpha[i] == 0 else (alpha[i] * y[i] * X[i]) for i in range(len(alpha))], axis=0) print('w: ', w) # 计算b,对support vector有:yi(w*xi + b) = 1,既有:b = 1/yi - w*xi B = [( 0 if alpha[i] == 0 else ( 1 / y[i] - np.dot(w, X[i]) ) ) for i in range(len(alpha))] B = list(filter(lambda x: x != 0, B)) b = 0 if len(B) <= 0 else B[0] print('b: ', b)
At this point, the parameter solving process of the support vector machine is completed.
The running result is shown in the following figure:
drawing
Finally plot the data as an image.
limit = 11
plt.xlim(-2, limit)
plt.ylim(-2, limit)
# 绘制数据点
[plt.scatter(X[i][0],X[i][1], s=100, color=('r' if y[i] > 0 else 'y')) for i in range(len(X))] # 绘制分割超平面L: wx + b = 0 plt.plot([i for i in range(limit)], [(-b - w[0]*i)/w[1] for i in range(limit)]) # 绘制上下边: wx + b = 1/-1 plt.plot([i for i in range(limit)], [(1-b - w[0]*i)/w[1] for i in range(limit)]) plt.plot([i for i in range(limit)], [(-1-b - w[0]*i)/w[1] for i in range(limit)]) plt.show()
The effect is as shown below. The red dots are '+' samples and the green dots are '-' samples. The blue line in the middle is the standard line for classification. Boundary lines, i.e. the red and green lines respectively pass through the points in their respective categories that are closest to the taxonomy line. These points are support vectors, and only the $-vec{\alpha}-$ components corresponding to these vectors are non-zero values.
The source code of this article