This article will first derive the main formula of SVM, then based on the Platt-SMO algorithm, implement SVM that supports multiple kernel functions from scratch, and then implement multi-classification based on the One-Versus-One strategy, and finally on the MNIST and CIFAR-10 data sets Run performance tests. Implementing support vector machines from scratch
This article was born out of the boring big homework of the machine learning course. Since a lot of time has been wasted on this, you might as well spend more time writing an article to record the implementation process. The mathematical form of the support vector machine is simple and intuitive, but once it comes to the specific implementation, various problems will come one after another. In this article, the author will first deduce the main formula of SVM, then based on the Platt-SMO algorithm, implement SVM that supports multiple kernel functions from scratch, and then implement multi-classification based on the One-Versus-One strategy, and finally in MNIST and CIFAR- Performance testing on 10 datasets
Mathematical derivation
Basic form
This is a quadratic programming problem, and we can use methods such as gradient descent or coordinate descent to solve it.
Kernel functions and kernel tricks
Platt-SMO Algorithm
Algorithm implementation
kernel function
We implement the above four kernel functions. Here, the kernel function is encapsulated into a class. By implementing the __call__
method, its instance can be called like a function.
The linear kernel code is as follows
class LinearKernel(object):
def __init__(self):
self.name = 'linear'
def __call__(self, X, y):
return X @ y.T
The polynomial kernel code is as follows
class PolynomialKernel(object):
def __init__(self, gamma=1.0, degree=3):
self.name = 'polynomial'
self.gamma = gamma
self.degree = degree
def __call__(self, X, y):
return np.power(self.gamma * (X @ y.T) + 1, self.degree)
The Gaussian kernel code is as follows
class GaussianKernel(object):
def __init__(self, gamma=1.0):
self.name = 'gaussian'
self.gamma = gamma
def __call__(self, X, y):
return np.exp(-self.gamma * np.sum(np.square(X - y), axis=1))
The Sigmoid kernel code is as follows
class SigmoidKernel(object):
def __init__(self, gamma=1.0, bias=0.0):
self.name = 'sigmoid'
self.gamma = gamma
self.bias = bias
def __call__(self, X, y):
return np.tanh(self.gamma * (X @ y.T) + self.bias)
In addition, we define a tool function to facilitate the creation of kernel functions
def CreateKernel(entry):
if entry['name'] == 'linear':
return LinearKernel()
elif entry['name'] == 'polynomial':
return PolynomialKernel(entry['gamma'], entry['degree'])
elif entry['name'] == 'gaussian':
return GaussianKernel(entry['gamma'])
elif entry['name'] == 'sigmoid':
return SigmoidKernel(entry['gamma'], entry['bias'])
raise AttributeError('invalid kernel')
Support Vector Machines
Refer scikit-learn
to the encapsulation, we define a class, provide fit
and predict
two methods, the parameters include the maximum number of iterations, penalty coefficient, error precision and kernel function type, use the private function to realize the selection and single-step update of and, for the linear kernel, we provide the weight
attribute , used to obtain the classification hyperplane parameters of the linear kernel, except for some simplifications, the code is basically implemented according to the Platt-SMO algorithm
class SupportVectorMachine(object):
def __init__(self, iteration=100, penalty=1.0, epsilon=1e-6, kernel=None):
self.iteration = iteration
self.penalty = penalty
self.epsilon = epsilon
if kernel is None:
kernel = {'name': 'linear'}
self.kernel = CreateKernel(kernel)
def __compute_w(self):
return (self.a * self.y) @ self.X
def __compute_e(self, i):
return (self.a * self.y) @ self.K[:, i] + self.b - self.y[i]
def __select_j(self, i):
j = np.random.randint(1, self.m)
return j if j > i else j - 1
def __step_forward(self, i):
e_i = self.__compute_e(i)
if ((self.a[i] > 0) and (e_i * self.y[i] > self.epsilon)) or ((self.a[i] < self.penalty) and (e_i * self.y[i] < -self.epsilon)):
j = self.__select_j(i)
e_j = self.__compute_e(j)
a_i, a_j = np.copy(self.a[i]), np.copy(self.a[j])
if self.y[i] == self.y[j]:
L = max(0, a_i + a_j - self.penalty)
H = min(self.penalty, a_i + a_j)
else:
L = max(0, a_j - a_i)
H = min(self.penalty, self.penalty + a_j - a_i)
if L == H:
return False
d = 2 * self.K[i, j] - self.K[i, i] - self.K[j, j]
if d >= 0:
return False
self.a[j] = np.clip(a_j - self.y[j] * (e_i - e_j) / d, L, H)
if np.abs(self.a[j] - a_j) < self.epsilon:
return False
self.a[i] = a_i + self.y[i] * self.y[j] * (a_j - self.a[j])
b_i = self.b - e_i - self.y[i] * self.K[i, i] * (self.a[i] - a_i) - self.y[j] * self.K[j, i] * (self.a[j] - a_j)
b_j = self.b - e_j - self.y[i] * self.K[i, j] * (self.a[i] - a_i) - self.y[j] * self.K[j, j] * (self.a[j] - a_j)
if 0 < self.a[i] < self.penalty:
self.b = b_i
elif 0 < self.a[j] < self.penalty:
self.b = b_j
else:
self.b = (b_i + b_j) / 2
return True
return False
def setup(self, X, y):
self.X, self.y = X, y
self.m, self.n = X.shape
self.b = 0.0
self.a = np.zeros(self.m)
self.K = np.zeros((self.m, self.m))
for i in range(self.m):
self.K[:, i] = self.kernel(X, X[i, :])
def fit(self, X, y):
self.setup(X, y)
entire = True
for _ in range(self.iteration):
change = 0
if entire:
for i in range(self.m):
change += self.__step_forward(i)
else:
index = np.nonzero((0 < self.a) * (self.a < self.penalty))[0]
for i in index:
change += self.__step_forward(i)
if entire:
entire = False
elif change == 0:
entire = True
def predict(self, X):
m = X.shape[0]
y = np.zeros(m)
for i in range(m):
y[i] = np.sign((self.a * self.y) @ self.kernel(self.X, X[i, :]) + self.b)
return y
@property
def weight(self):
if self.kernel.name != 'linear':
raise AttributeError('non-linear kernel')
return self.__compute_w(), self.b
multi-category
Based on the One-Versus-One strategy, we construct a SVM, where is the number of categories. When training each classifier, select the samples of the corresponding category as the training set, and map the labels to -1 and 1. When predicting, use each The prediction results of classifiers are voted to get the final result
We use the same package as the support vector machine, providing fit
and predict
two methods, making this class a general classification model
class SupportVectorClassifier(object):
def __init__(self, iteration=100, penalty=1.0, epsilon=1e-6, kernel=None):
self.iteration = iteration
self.penalty = penalty
self.epsilon = epsilon
self.kernel = kernel
self.classifier = []
def __build_model(self, y):
self.label = np.unique(y)
for i in range(len(self.label)):
for j in range(i+1, len(self.label)):
model = SupportVectorMachine(self.iteration, self.penalty, self.epsilon, self.kernel)
self.classifier.append((i, j, model))
def fit(self, X, y):
self.__build_model(y)
for i, j, model in tqdm(self.classifier):
index = np.where((y == self.label[i]) | (y == self.label[j]))[0]
X_ij, y_ij = X[index], np.where(y[index] == self.label[i], -1, 1)
model.fit(X_ij, y_ij)
def predict(self, X):
vote = np.zeros((X.shape[0], len(self.label)))
for i, j, model in tqdm(self.classifier):
y = model.predict(X)
vote[np.where(y == -1)[0], i] += 1
vote[np.where(y == 1)[0], j] += 1
return self.label[np.argmax(vote, axis=1)]
Performance Testing
First, we construct two sets of simple normal distribution data on a two-dimensional plane to visualize the classification effect of the support vector machine, first construct the data and train the model
X = np.concatenate((np.random.randn(500, 2) - 2, np.random.randn(500, 2) + 2))
y = np.concatenate((np.ones(500), -np.ones(500)))
C = SupportVectorMachine(iteration=100)
C.fit(X, y)
w, b = C.weight
u = np.linspace(-3, 3, 100)
v = (-b - w[0] * u) / w[1]
Then plot the classification effect according to the model parameters
plt.scatter(X[:500, 0], X[:500, 1], label='Positive')
plt.scatter(X[500:, 0], X[500:, 1], label='Negative')
plt.plot(u, v, label='Separation', c='g')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('Separation Sample')
plt.grid()
plt.legend()
plt.tight_layout()
plt.savefig('./figure/separation.png')
plt.show()
It can be seen that the SVM we implemented can separate the two sets of data very well
def MNIST(path, group='train'):
if group == 'train':
with gzip.open(os.path.join(path, 'train-images-idx3-ubyte.gz'), 'rb') as file:
image = np.frombuffer(file.read(), np.uint8, offset=16).reshape(-1, 1, 28, 28) / 255.0
with gzip.open(os.path.join(path, 'train-labels-idx1-ubyte.gz'), 'rb') as file:
label = np.frombuffer(file.read(), np.uint8, offset=8)
elif group == 'test':
with gzip.open(os.path.join(path, 't10k-images-idx3-ubyte.gz'), 'rb') as file:
image = np.frombuffer(file.read(), np.uint8, offset=16).reshape(-1, 1, 28, 28) / 255.0
with gzip.open(os.path.join(path, 't10k-labels-idx1-ubyte.gz'), 'rb') as file:
label = np.frombuffer(file.read(), np.uint8, offset=8)
remain = 500 if group == 'train' else 100
image_list, label_list = [], []
for value in range(10):
index = np.where(label == value)[0][:remain]
image_list.append(image[index])
label_list.append(label[index])
image, label = np.concatenate(image_list), np.concatenate(label_list)
index = np.random.permutation(len(label))
return image[index], label[index]
For the CIFAR10 dataset, we do the same
def CIFAR10(path, group='train'):
if group == 'train':
image_list, label_list = [], []
for i in range(1, 6):
filename = os.path.join(path, 'data_batch_{}'.format(i))
with open(filename, 'rb') as file:
data = pickle.load(file, encoding='bytes')
image_list.append(np.array(data[b'data'], dtype=np.float32).reshape(-1, 3, 32, 32) / 255.0)
label_list.append(np.array(data[b'labels'], dtype=np.int32))
image, label = np.concatenate(image_list), np.concatenate(label_list)
elif group == 'test':
filename = os.path.join(path, 'test_batch')
with open(filename, 'rb') as file:
data = pickle.load(file, encoding='bytes')
image = np.array(data[b'data'], dtype=np.float32).reshape(-1, 3, 32, 32) / 255.0
label = np.array(data[b'labels'], dtype=np.int32)
remain = 500 if group == 'train' else 100
image_list, label_list = [], []
for value in range(10):
index = np.where(label == value)[0][:remain]
image_list.append(image[index])
label_list.append(label[index])
image, label = np.concatenate(image_list), np.concatenate(label_list)
index = np.random.permutation(len(label))
return image[index], label[index]
Due to the difficulty of the CIFAR10 dataset, we consider using the CV method for feature extraction. Here we use the HOG feature to improve the classification effect. First, convert the color image to a grayscale image
def RGB2Gray(image):
image = 0.299 * image[0] + 0.587 * image[1] + 0.114 * image[2]
return image.reshape(1, *image.shape)
Then implement a simple HOG feature extraction function. Here we have not implemented block overlap. Improving this function should further improve the classification effect whaosoft aiot http://143ai.com
def HOG(image, block=4, partition=8):
image = RGB2Gray(image).squeeze(axis=0)
height, width = image.shape
gradient = np.zeros((2, height, width), dtype=np.float32)
for i in range(1, height-1):
for j in range(1, width-1):
delta_x = image[i, j-1] - image[i, j+1]
delta_y = image[i+1, j] - image[i-1, j]
gradient[0, i, j] = np.sqrt(delta_x ** 2 + delta_y ** 2)
gradient[1, i, j] = np.degrees(np.arctan2(delta_y, delta_x))
if gradient[1, i, j] < 0:
gradient[1, i, j] += 180
unit = 360 / partition
vertical, horizontal = height // block, width // block
feature = np.zeros((vertical, horizontal, partition), dtype=np.float32)
for i in range(vertical):
for j in range(horizontal):
for k in range(block):
for l in range(block):
rho = gradient[0, i*block+k, j*block+l]
theta = gradient[1, i*block+k, j*block+l]
index = int(theta // unit)
feature[i, j, index] += rho
feature[i, j] /= np.linalg.norm(feature[i, j]) + 1e-6
return feature.reshape(-1)
Based on these tool functions, we can elegantly complete image classification tasks. For the MNIST dataset, an example of classification based on linear kernels is as follows
X_train, y_train = MNIST('./dataset/mnist_data/', group='train')
X_test, y_test = MNIST('./dataset/mnist_data/', group='test')
X_train, X_test = X_train.reshape(-1, 28*28), X_test.reshape(-1, 28*28)
model = SupportVectorClassifier(iteration=100, penalty=0.05)
model.fit(X_train, y_train)
p_train, p_test = model.predict(X_train), model.predict(X_test)
r_train, r_test = ComputeAccuracy(p_train, y_train), ComputeAccuracy(p_test, y_test)
print('Kernel: Linear, Train: {:.2%}, Test: {:.2%}'.format(r_train, r_test))
For the CIFAR10 dataset, an example of classification based on HOG features and Gaussian kernel is as follows
X_train, y_train = CIFAR10('./dataset/cifar-10-batches-py/', group='train')
X_test, y_test = CIFAR10('./dataset/cifar-10-batches-py/', group='test')
X_train, X_test = BatchHOG(X_train, partition=16), BatchHOG(X_test, partition=16)
kernel = {'name': 'gaussian', 'gamma': 0.03}
model = SupportVectorClassifier(iteration=100, kernel=kernel)
model.fit(X_train, y_train)
p_train, p_test = model.predict(X_train), model.predict(X_test)
r_train, r_test = ComputeAccuracy(p_train, y_train), ComputeAccuracy(p_test, y_test)
print('Kernel: Gaussian, Train: {:.2%}, Test: {:.2%}'.format(r_train, r_test))
After testing, the classification accuracy of the SVM classifier we implemented on the MNIST and CIFAR10 datasets is shown in the table below
In addition, we tested the convergence of the model and the parameter selection of each kernel function. The relationship between the model accuracy and the number of iterations is shown in the figure below
The above results reveal the influence of each parameter on the performance of the model, which can provide some guidance for parameter tuning
write at the end
It has only been ten years since SVM was so hot in the past to now. In terms of accuracy and efficiency, SVM is completely defeated by neural networks that can be seen everywhere today. I am also confused about the significance of implementing SVM from scratch, but This process has more or less changed my understanding of machine learning. A simple and elegant polynomial time-accurate algorithm may only satisfy the cleanliness of theoretical researchers, while an approximate algorithm that optimizes complex models wins the future in engineering .