Installation and testing of Libsvm in MATLAB


Libsvm is a support vector machine (SVM) library written by Chih-Chung Chang and Chih-Jen Lin of National Taiwan University, which can be used for classification problems.
Official website link: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ .

SVM principle

设样本集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x m , y m ) } D=\{(\mathbf{x_1},y_1),(\mathbf{x_2},y_2),...,(\mathbf{x_m},y_m)\} D={ (x1,Y1),(x2,Y2),...,(xm,Ym)}, y i = − 1 , 1 y_i=-1,1 Yi=1,1. Need to find a hyperplane to separate the two types of samples (first assume that they can be completely separated), the equation is:
w T x + b = 0 \mathbf{w^Tx}+\mathbf{b}=0wTx+b=0
Sincew and bcan change proportionally while the hyperplane remains unchanged, it can be assumed that for the sample point closest to the hyperplane (ie the support vector),yi (w T xi + b) = 1 y_i(\mathbf{w^ Tx_i}+\mathbf{b})=1Yi(wTxi+b)=1 . This assumption avoids the optimal set of (w, b) proportional changes and intuitively divides the two types of samples symmetrically.
The support vector machine is to find the sum of the distances between the two heterogeneous support vectors and the hyperplaneγ = 2 / ∥ w ∥ \gamma=2/\|\mathbf{w}\|c=2 / w ∥ The largest parameter (refer to the analytic geometry textbook for distance calculation), so it is reduced to the following problem:
minw, b ∥ w ∥ 2, s. T. Yi (w T xi + b) ≥ 1 min_{\mathbf{ w,b)} \quad \|\mathbf{w}\|^2,\\\quad\\st \quad y_i(\mathbf{w^Tx_i}+\mathbf{b})\geq1m i nw,bw2,s.t.Yi(wTxi+b)1

In most cases, there is no guarantee that the two types of samples can be completely separated by the hyperplane, so the feature space and relaxation variables are introduced:
minw, b, θ i 1 2 ∥ w ∥ 2 + C ∑ i θ is. T. Yi (w T ϕ (xi) + b) ≥ 1 − θ i, θ i ≥ 0 min_{\mathbf{w,b},\theta_i}\quad \frac{1}{2}\|\mathbf{w}\|^ 2+C\sum_i\theta_i\\ st\quad y_i(\mathbf{w^T\phi(x_i)}+\mathbf{b})\geq1-\theta_i,\theta_i\geq0m i nw,b,θi21w2+Ciθis.t.Yi(wTϕ(xi)+b)1θi,θi0
whereϕ \phiϕ is used to setx \mathbf(x)x is mapped to a high-dimensional feature space. After mapping, the original linearly inseparable samples may become linearly separable. The inner product is called the kernel function, and RBF kernel can usually be used, that is,K (xi, xj) = ϕ T (xi) ϕ (xj) = e − γ ∥ xi − xj ∥ 2. K(\mathbf{x}_i ,\mathbf{x}_j)=\phi^T(\mathbf{x}_i)\phi(\mathbf{x}_j)=e^{-\gamma\|\mathbf{x}_i-\mathbf{ x}_j\|^2}.K(xi,xj)=ϕT(xi) ϕ ( xj)=eγxixj2 .
Slack variableθ i \ theta_iθiMakes the support vector machine can make mistakes on some samples.
(Refer to Zhou Zhihua's "Machine Learning")

installation

First, you need to test whether minGW is installed in your computer. Enter under MATLAB command line

mex -setup

If an error indicates that the compiler is missing, just install it as prompted.
After that, go to the official website to download the libsvm compressed package. After decompression, it is found that there are multiple folders such as matlab\java\python. Since we use matlab, copy the windows folder to the matlab folder.
At this point, we can find that there are all *.c, .h files, and there is only one make.m, which is used to generate .mex files. But I run make.m directly and make an error. I searched it online and changed it to:

%meke.m
mex  libsvmread.c
mex  libsvmwrite.c
mex  -largeArrayDims svmtrain.c ../svm.cpp svm_model_matlab.c
mex  -largeArrayDims svmpredict.c ../svm.cpp svm_model_matlab.c

Run and generate four *.mex files.
After that, add a folder under the default path of the MATLAB homepage:
C:\Users\Lenovo\Desktop\libsvm\libsvm-3.24\matlab
C:\Users\Lenovo\Desktop\libsvm \libsvm-3.24\matlab\windows

test

Suppose the sample contains two attributes, the values ​​are [0, 1] [0,1][0,1 ] , if the difference in value is less than 0.5, it is classified as the same type (y=1), if it is less than0.5 0.50 . . 5 , classified as heterogeneous (y = -1), and adding some misclassified training set of samples. This sample set is linear and inseparable in two dimensions. To test the classification effect of libsvm, the code is as follows:

train_data=rand(5000,2);
train_label=is_same(train_data,0.1); 

test_data=rand(200,2);
test_label=is_same(test_data,0);
model=svmtrain(train_label,train_data);
pred=svmpredict(test_label,test_data,model);

%判断向量每一行两个数是否差值低于0.3,是返回1,否返回-1
%加入噪声引入误判(用于训练集)
function f=is_same(x,noise)
temp=abs(x(:,1)-x(:,2))+noise*(rand(size(x,1),1)-0.5);
f=2*(temp<0.3)-1;
end

After running, the accuracy rate is 96.5% (note that test_label is entered in the svmpredict function, but this is only used to calculate the accuracy rate, and switching to vectors with the same dimensions in other dimensions does not affect the classification results).
The classification result is shown in the figure:
Insert picture description here
Drawing code:

t_1=test_data((pred==1),:);
t_2=test_data((pred==-1),:);
plot(t_1(:,1),t_1(:,2),'o',t_2(:,1),t_2(:,2),'*')
legend('same','different')
hold on
fplot(@(x)(x+0.3))
fplot(@(x)(x-0.3))
axis([0,1,0,1])

Guess you like

Origin blog.csdn.net/certate/article/details/108934708