SVM multi-classification machine learning

 

Experimental requirements
Data Description: data4train.mat dataset is a matrix of 2 * 150, 150 represent the samples, each sample having a two-dimensional features, which marked truelabel.mat class file, trainning sample figure shows the ideal classification type result;
scheme selection:
select and implement a two classifications (e.g. perceptron method, the SVM, etc.); based on this design uses the binary classifier implements three classification strategies and procedures to achieve, the classification result shown
directly using existing methods (e.g., multi-class SVM, BP network, etc.) can be multi-classification problem solving. Draw the classification results. I choose the second, not enough time, can only be achieved using sklearn in svc

Achieve ideological
one:
it is in practice between any two types of sample design a SVM, so the sample k categories will need to design k (k-1) / 2 months SVM. When an unknown sample to classify the most votes last category is the category of the unknown sample.
Pros: do not need to re-train all of SVM, only we need to re-train and increase the voice samples associated classifier. When training a single model, faster.
Disadvantages: the number of desired configuration and testing of the binary classifier to a quadratic function with respect to k growth, the total time of the test and training time is relatively slow.

Many
training in order to sample a category classified as a class, the other remaining classified as another type of sample, so the sample k categories to construct a k-SVM. When the classification of unknown samples classified as having the greatest kind classification function value.
Advantages **: ** k trained classifiers, a fewer number, the classification is relatively fast.
Disadvantages:
① When training each classifier are all of the samples as training samples, so in solving quadratic programming problems, train speed will increase the number of training samples sharply slowed down;
② the same time due to the negative sample class the data is much larger than the positive class data sample, which appeared in the case of asymmetric sample, and that this situation with the increase in the training data and trend seriously. Solve the problem of asymmetric can introduce different penalty factor, the sample point is less positive class using a larger penalty factor C;
③ there is when there is a new category added in, the need for re all models training

Hierarchy Tree:
First of all categories are divided into two categories, sub-categories and then further sub-divided into two sub-categories, so the cycle continues until all nodes only contains a separate category so far, this node is a binary tree species leaf. The classification of the original classification is also broken down into a series of binary classification problems, the classification function between the two subclasses using SVM.

I choose here many, because only three

 

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2019/7/2 23:25
# @Author : 朱红喜
# @File : Multi-classify.py
# @Software: PyCharm

# 引入必要的库
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import label_binarize
from sklearn.multiclass import OneVsRestClassifier

from FileUtil import FileUtil

Loading data #
# 1. model training data
X = FileUtil.open_matfile ( "data4train.mat") . T # data sets
y = FileUtil.open_matfile ( "truelabel.mat") # real tag
Print (X-)
Print (Y [ 0])
Print (y.shape)


2. Test Data Model #
X_2 = FileUtil.open_matfile ( "data4test.mat"). T
Y_2 = FileUtil.open_matfile ( "testtruelabel.mat")
Print (X_2)
Print (Y_2 [0])
Print (y_2.shape)


# Tag binarized
Y = label_binarize (Y [0], classes = [. 1, 2,. 3])
# Print (Y)


# Dividing training set and test set
# Set type
n_classes y.shape = [. 1]
# Print (y.shape [. 1])


# Training model and predict
random_state = np.random.RandomState (0)
N_SAMPLES, n_features = X.shape
# randomized data, and partitioning the training data and test data
X_train, X_test, y_train, y_test = train_test_split (X, y, test_size = .5, random_state = 0)

# 训练模型
# Learn to predict each class against the other
model = OneVsRestClassifier(svm.SVC(kernel='linear', probability=True, random_state=random_state))
clt = model.fit(X_train, y_train)


Performance Evaluation #
# 1. On the training set a score
clt.score (X_train, y_train)
Print (clt.score (X_train, y_train))

# 2 on the test set score
clt.score (X_test, android.permission.FACTOR.)
Print (clt.score (X_test, android.permission.FACTOR.))

# Check each category prediction situation
y_predict_scores = clt.decision_function (X_test)
Print (y_predict_scores [: 149])

# Original tag into mode
Result = np.argmax (clt.decision_function (X_test), Axis =. 1) [: 149]
# Print (Result)
# teacher needs to be converted to 1,2,3-based standard
for i in range ( __ .__ len Result ()):
Result [I] = Result [I] + 1'd

print(result)


print ( "++++++++++++++++++++++ data4train dataset ++++++++++++++++++")
result_2 = np.argmax (clt.decision_function (X-), Axis =. 1) [: 149]
# Print (result_2)
# teacher needs to be converted to 1,2,3-based standard
for I in Range (result_2 .__ len __ ()):
result_2 [I] = result_2 [I] + 1'd
Print (result_2)


print ( "++++++++++++++++++++++ data4test test set ++++++++++++++++++")
result_2 = np.argmax (clt.decision_function (X_2), Axis =. 1) [: 59]
# Print (result_2)
# teacher needs to be converted to 1,2,3-based standard
for I in Range (result_2 .__ len __ ()):
result_2 [I] = result_2 [I] + 1'd
Print (result_2)

Classification results


 

Guess you like

Origin www.cnblogs.com/liuys635/p/11183827.html