python support vector machine

This article will first derive the main formula of SVM, then based on the Platt-SMO algorithm, implement SVM that supports multiple kernel functions from scratch, and then implement multi-classification based on the One-Versus-One strategy, and finally on the MNIST and CIFAR-10 data sets Run performance tests. Implementing support vector machines from scratch

This article was born out of the boring big homework of the machine learning course. Since a lot of time has been wasted on this, you might as well spend more time writing an article to record the implementation process. The mathematical form of the support vector machine is simple and intuitive, but once it comes to the specific implementation, various problems will come one after another. In this article, the author will first deduce the main formula of SVM, then based on the Platt-SMO algorithm, implement SVM that supports multiple kernel functions from scratch, and then implement multi-classification based on the One-Versus-One strategy, and finally in MNIST and CIFAR- Performance testing on 10 datasets

Mathematical derivation

Basic form

 

This is a quadratic programming problem, and we can use methods such as gradient descent or coordinate descent to solve it.

Kernel functions and kernel tricks

Platt-SMO Algorithm

Algorithm implementation

kernel function

We implement the above four kernel functions. Here, the kernel function is encapsulated into a class. By implementing the __call__method, its instance can be called like a function.

The linear kernel code is as follows

class LinearKernel(object):
    def __init__(self):
        self.name = 'linear'
 
    def __call__(self, X, y):
        return X @ y.T

The polynomial kernel code is as follows

class PolynomialKernel(object):
    def __init__(self, gamma=1.0, degree=3):
        self.name = 'polynomial'
        self.gamma = gamma
        self.degree = degree
    
    def __call__(self, X, y):
        return np.power(self.gamma * (X @ y.T) + 1, self.degree)

The Gaussian kernel code is as follows

class GaussianKernel(object):
    def __init__(self, gamma=1.0):
        self.name = 'gaussian'
        self.gamma = gamma
    
    def __call__(self, X, y):
        return np.exp(-self.gamma * np.sum(np.square(X - y), axis=1))

The Sigmoid kernel code is as follows

class SigmoidKernel(object):
    def __init__(self, gamma=1.0, bias=0.0):
        self.name = 'sigmoid'
        self.gamma = gamma
        self.bias = bias
    
    def __call__(self, X, y):
        return np.tanh(self.gamma * (X @ y.T) + self.bias)

In addition, we define a tool function to facilitate the creation of kernel functions

def CreateKernel(entry):
    if entry['name'] == 'linear':
        return LinearKernel()
    elif entry['name'] == 'polynomial':
        return PolynomialKernel(entry['gamma'], entry['degree'])
    elif entry['name'] == 'gaussian':
        return GaussianKernel(entry['gamma'])
    elif entry['name'] == 'sigmoid':
        return SigmoidKernel(entry['gamma'], entry['bias'])
    raise AttributeError('invalid kernel')
Support Vector Machines

Refer scikit-learnto the encapsulation, we define a class, provide fitand predicttwo methods, the parameters include the maximum number of iterations, penalty coefficient, error precision and kernel function type, use the private function to realize the selection and single-step update of and, for the linear kernel, we provide the weightattribute , used to obtain the classification hyperplane parameters of the linear kernel, except for some simplifications, the code is basically implemented according to the Platt-SMO algorithm

class SupportVectorMachine(object):
    def __init__(self, iteration=100, penalty=1.0, epsilon=1e-6, kernel=None):
        self.iteration = iteration
        self.penalty = penalty
        self.epsilon = epsilon
        if kernel is None:
            kernel = {'name': 'linear'}
        self.kernel = CreateKernel(kernel)
    
    def __compute_w(self):
        return (self.a * self.y) @ self.X

    def __compute_e(self, i):
        return (self.a * self.y) @ self.K[:, i] + self.b - self.y[i]
    
    def __select_j(self, i):
        j = np.random.randint(1, self.m)
        return j if j > i else j - 1
    
    def __step_forward(self, i):
        e_i = self.__compute_e(i)
        if ((self.a[i] > 0) and (e_i * self.y[i] > self.epsilon)) or ((self.a[i] < self.penalty) and (e_i * self.y[i] < -self.epsilon)):
            j = self.__select_j(i)
            e_j = self.__compute_e(j)
            a_i, a_j = np.copy(self.a[i]), np.copy(self.a[j])
            if self.y[i] == self.y[j]:
                L = max(0, a_i + a_j - self.penalty)
                H = min(self.penalty, a_i + a_j)
            else:
                L = max(0, a_j - a_i)
                H = min(self.penalty, self.penalty + a_j - a_i)
            if L == H:
                return False
            d = 2 * self.K[i, j] - self.K[i, i] - self.K[j, j]
            if d >= 0:
                return False
            self.a[j] = np.clip(a_j - self.y[j] * (e_i - e_j) / d, L, H)
            if np.abs(self.a[j] - a_j) < self.epsilon:
                return False
            self.a[i] = a_i + self.y[i] * self.y[j] * (a_j - self.a[j])
            b_i = self.b - e_i - self.y[i] * self.K[i, i] * (self.a[i] - a_i) - self.y[j] * self.K[j, i] * (self.a[j] - a_j)
            b_j = self.b - e_j - self.y[i] * self.K[i, j] * (self.a[i] - a_i) - self.y[j] * self.K[j, j] * (self.a[j] - a_j)
            if 0 < self.a[i] < self.penalty:
                self.b = b_i
            elif 0 < self.a[j] < self.penalty:
                self.b = b_j
            else:
                self.b = (b_i + b_j) / 2
            return True
        return False
    
    def setup(self, X, y):
        self.X, self.y = X, y
        self.m, self.n = X.shape
        self.b = 0.0
        self.a = np.zeros(self.m)
        self.K = np.zeros((self.m, self.m))
        for i in range(self.m):
            self.K[:, i] = self.kernel(X, X[i, :])
    
    def fit(self, X, y):
        self.setup(X, y)
        entire = True
        for _ in range(self.iteration):
            change = 0
            if entire:
                for i in range(self.m):
                    change += self.__step_forward(i)
            else:
                index = np.nonzero((0 < self.a) * (self.a < self.penalty))[0]
                for i in index:
                    change += self.__step_forward(i)
            if entire:
                entire = False
            elif change == 0:
                entire = True

    def predict(self, X):
        m = X.shape[0]
        y = np.zeros(m)
        for i in range(m):
            y[i] = np.sign((self.a * self.y) @ self.kernel(self.X, X[i, :]) + self.b)
        return y
    
    @property
    def weight(self):
        if self.kernel.name != 'linear':
            raise AttributeError('non-linear kernel')
        return self.__compute_w(), self.b
multi-category

Based on the One-Versus-One strategy, we construct a SVM, where is the number of categories. When training each classifier, select the samples of the corresponding category as the training set, and map the labels to -1 and 1. When predicting, use each The prediction results of classifiers are voted to get the final result

We use the same package as the support vector machine, providing fitand predicttwo methods, making this class a general classification model

class SupportVectorClassifier(object):
    def __init__(self, iteration=100, penalty=1.0, epsilon=1e-6, kernel=None):
        self.iteration = iteration
        self.penalty = penalty
        self.epsilon = epsilon
        self.kernel = kernel
        self.classifier = []

    def __build_model(self, y):
        self.label = np.unique(y)
        for i in range(len(self.label)):
            for j in range(i+1, len(self.label)):
                model = SupportVectorMachine(self.iteration, self.penalty, self.epsilon, self.kernel)
                self.classifier.append((i, j, model))

    def fit(self, X, y):
        self.__build_model(y)
        for i, j, model in tqdm(self.classifier):
            index = np.where((y == self.label[i]) | (y == self.label[j]))[0]
            X_ij, y_ij = X[index], np.where(y[index] == self.label[i], -1, 1)
            model.fit(X_ij, y_ij)
    
    def predict(self, X):
        vote = np.zeros((X.shape[0], len(self.label)))
        for i, j, model in tqdm(self.classifier):
            y = model.predict(X)
            vote[np.where(y == -1)[0], i] += 1
            vote[np.where(y == 1)[0], j] += 1
        return self.label[np.argmax(vote, axis=1)]
Performance Testing

First, we construct two sets of simple normal distribution data on a two-dimensional plane to visualize the classification effect of the support vector machine, first construct the data and train the model

X = np.concatenate((np.random.randn(500, 2) - 2, np.random.randn(500, 2) + 2))
y = np.concatenate((np.ones(500), -np.ones(500)))
C = SupportVectorMachine(iteration=100)
C.fit(X, y)
w, b = C.weight
u = np.linspace(-3, 3, 100)
v = (-b - w[0] * u) / w[1]

Then plot the classification effect according to the model parameters

plt.scatter(X[:500, 0], X[:500, 1], label='Positive')
plt.scatter(X[500:, 0], X[500:, 1], label='Negative')
plt.plot(u, v, label='Separation', c='g')
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.title('Separation Sample')
plt.grid()
plt.legend()
plt.tight_layout()
plt.savefig('./figure/separation.png')
plt.show()

It can be seen that the SVM we implemented can separate the two sets of data very well

 

def MNIST(path, group='train'):
    if group == 'train':
        with gzip.open(os.path.join(path, 'train-images-idx3-ubyte.gz'), 'rb') as file:
            image = np.frombuffer(file.read(), np.uint8, offset=16).reshape(-1, 1, 28, 28) / 255.0
        with gzip.open(os.path.join(path, 'train-labels-idx1-ubyte.gz'), 'rb') as file:
            label = np.frombuffer(file.read(), np.uint8, offset=8)
    elif group == 'test':
        with gzip.open(os.path.join(path, 't10k-images-idx3-ubyte.gz'), 'rb') as file:
            image = np.frombuffer(file.read(), np.uint8, offset=16).reshape(-1, 1, 28, 28) / 255.0
        with gzip.open(os.path.join(path, 't10k-labels-idx1-ubyte.gz'), 'rb') as file:
            label = np.frombuffer(file.read(), np.uint8, offset=8)
    remain = 500 if group == 'train' else 100
    image_list, label_list = [], []
    for value in range(10):
        index = np.where(label == value)[0][:remain]
        image_list.append(image[index])
        label_list.append(label[index])
    image, label = np.concatenate(image_list), np.concatenate(label_list)
    index = np.random.permutation(len(label))
    return image[index], label[index]

For the CIFAR10 dataset, we do the same

def CIFAR10(path, group='train'):
    if group == 'train':
        image_list, label_list = [], []
        for i in range(1, 6):
            filename = os.path.join(path, 'data_batch_{}'.format(i))
            with open(filename, 'rb') as file:
                data = pickle.load(file, encoding='bytes')
            image_list.append(np.array(data[b'data'], dtype=np.float32).reshape(-1, 3, 32, 32) / 255.0)
            label_list.append(np.array(data[b'labels'], dtype=np.int32))
        image, label = np.concatenate(image_list), np.concatenate(label_list)
    elif group == 'test':
        filename = os.path.join(path, 'test_batch')
        with open(filename, 'rb') as file:
            data = pickle.load(file, encoding='bytes')
        image = np.array(data[b'data'], dtype=np.float32).reshape(-1, 3, 32, 32) / 255.0
        label = np.array(data[b'labels'], dtype=np.int32)
    remain = 500 if group == 'train' else 100
    image_list, label_list = [], []
    for value in range(10):
        index = np.where(label == value)[0][:remain]
        image_list.append(image[index])
        label_list.append(label[index])
    image, label = np.concatenate(image_list), np.concatenate(label_list)
    index = np.random.permutation(len(label))
    return image[index], label[index]

Due to the difficulty of the CIFAR10 dataset, we consider using the CV method for feature extraction. Here we use the HOG feature to improve the classification effect. First, convert the color image to a grayscale image

def RGB2Gray(image):
    image = 0.299 * image[0] + 0.587 * image[1] + 0.114 * image[2]
    return image.reshape(1, *image.shape)

Then implement a simple HOG feature extraction function. Here we have not implemented block overlap. Improving this function should further improve the classification effect whaosoft  aiot  http://143ai.com  

def HOG(image, block=4, partition=8):
    image = RGB2Gray(image).squeeze(axis=0)
    height, width = image.shape
    gradient = np.zeros((2, height, width), dtype=np.float32)
    for i in range(1, height-1):
        for j in range(1, width-1):
            delta_x = image[i, j-1] - image[i, j+1]
            delta_y = image[i+1, j] - image[i-1, j]
            gradient[0, i, j] = np.sqrt(delta_x ** 2 + delta_y ** 2)
            gradient[1, i, j] = np.degrees(np.arctan2(delta_y, delta_x))
            if gradient[1, i, j] < 0:
                gradient[1, i, j] += 180
    unit = 360 / partition
    vertical, horizontal = height // block, width // block
    feature = np.zeros((vertical, horizontal, partition), dtype=np.float32)
    for i in range(vertical):
        for j in range(horizontal):
            for k in range(block):
                for l in range(block):
                    rho = gradient[0, i*block+k, j*block+l]
                    theta = gradient[1, i*block+k, j*block+l]
                    index = int(theta // unit)
                    feature[i, j, index] += rho
            feature[i, j] /= np.linalg.norm(feature[i, j]) + 1e-6
    return feature.reshape(-1)

Based on these tool functions, we can elegantly complete image classification tasks. For the MNIST dataset, an example of classification based on linear kernels is as follows

X_train, y_train = MNIST('./dataset/mnist_data/', group='train')
X_test, y_test = MNIST('./dataset/mnist_data/', group='test')
X_train, X_test = X_train.reshape(-1, 28*28), X_test.reshape(-1, 28*28)

model = SupportVectorClassifier(iteration=100, penalty=0.05)
model.fit(X_train, y_train)
p_train, p_test = model.predict(X_train), model.predict(X_test)

r_train, r_test = ComputeAccuracy(p_train, y_train), ComputeAccuracy(p_test, y_test)
print('Kernel: Linear, Train: {:.2%}, Test: {:.2%}'.format(r_train, r_test))

For the CIFAR10 dataset, an example of classification based on HOG features and Gaussian kernel is as follows

X_train, y_train = CIFAR10('./dataset/cifar-10-batches-py/', group='train')
X_test, y_test = CIFAR10('./dataset/cifar-10-batches-py/', group='test')
X_train, X_test = BatchHOG(X_train, partition=16), BatchHOG(X_test, partition=16)

kernel = {'name': 'gaussian', 'gamma': 0.03}
model = SupportVectorClassifier(iteration=100, kernel=kernel)
model.fit(X_train, y_train)
p_train, p_test = model.predict(X_train), model.predict(X_test)

r_train, r_test = ComputeAccuracy(p_train, y_train), ComputeAccuracy(p_test, y_test)
print('Kernel: Gaussian, Train: {:.2%}, Test: {:.2%}'.format(r_train, r_test))

After testing, the classification accuracy of the SVM classifier we implemented on the MNIST and CIFAR10 datasets is shown in the table below

In addition, we tested the convergence of the model and the parameter selection of each kernel function. The relationship between the model accuracy and the number of iterations is shown in the figure below 

The above results reveal the influence of each parameter on the performance of the model, which can provide some guidance for parameter tuning

write at the end

It has only been ten years since SVM was so hot in the past to now. In terms of accuracy and efficiency, SVM is completely defeated by neural networks that can be seen everywhere today. I am also confused about the significance of implementing SVM from scratch, but This process has more or less changed my understanding of machine learning. A simple and elegant polynomial time-accurate algorithm may only satisfy the cleanliness of theoretical researchers, while an approximate algorithm that optimizes complex models wins the future in engineering .

Guess you like

Origin blog.csdn.net/qq_29788741/article/details/131426777