MATLAB Realization of Data Analysis Based on K-Means Clustering and Support Vector Machine Classification
This article will introduce how to use MATLAB for data analysis. The main techniques used are K-means clustering and support vector machine classification. We will use a real dataset as an example to illustrate the application of these techniques in data analysis.
- Dataset description
We will use the IRIS dataset, which is a very popular dataset containing 150 samples, each with four features: sepal length, sepal width, petal length, and petal width. The IRIS dataset has three categories: Iris Setosa, Iris Versicolour, and Iris Virginica.
First, we need to download the IRIS dataset and import it into MATLAB. Here is the code:
load fisheriris.mat
X = meas;
Y = species;
- K-means clustering
We will use K-means clustering to divide the dataset into two groups because IRIS dataset has three categories and K-means clustering is an unsupervised learning algorithm. In MATLAB, we can use the kmeans function for clustering. Here is the code:
[idx, C] = kmeans(X, 2);
gscatter(X(:,1), X(:,2), idx)
After running the above code, you will get the clustering result shown in the figure below:
As can be seen from the figure above, K-means clustering divides the IRIS dataset into two groups, which is in line with our expectations.
- Support Vector Machine Classification
Next, we will use a support vector machine to classify the dataset. In MATLAB, we can use the svmtrain and svmclassify functions for support vector machine classification. Here is the code:
SVMModel = svmtrain(X, Y);
Y_SVM = svmclassify(S