[Getting started] Detailed k-means clustering function (based on the iris data set) [MATLAB]

k-means clustering function

Put the routine first

Click here for the official routine ☜ Train a k-Means Clustering Algorithm .


load fisheriris
X = meas(:,3:4);

figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

rng(1); % For reproducibility
[idx,C] = kmeans(X,3);

    % Assigns each node in the grid to the closest centroid
x1 = min(X(:,1)):0.01:max(X(:,1));
x2 = min(X(:,2)):0.01:max(X(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on the plot

idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);
    % Assigns each node in the grid to the closest centroid
    
figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...
    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
legend('Region 1','Region 2','Region 3','Data','Location','SouthEast');
hold off;    

Segment analysis

  • Note: The "%" section is a comment, and the MATLAB code seems to be unable to be highlighted in the markdown editor (whisper, it may be that I haven't studied QuQ myself).

PART1 - Loading the dataset

load fisheriris
X = meas(:,3:4);

Load the sample data and save the 3rd and 4th columns of the data array into the variable X.
The purpose of this step is mainly to obtain sample data sets, and there are many other types of data sets that can be used. You can refer to this blog "Some data sets for clustering and classification problems" .
fisheriris—the iris data set (meaning the iris data set of the fisher algorithm), is a data set for multivariate analysis, with a sample size of 150 and 50 for each category. Use the four attributes of sepal length, sepal width, petal length, and petal width to predict which of the three types of iris flowers belong to (Setosa (mountain iris), Versicolour (variegated iris), Virginica (Virginia iris)) .
The complete data set of iris is placed at the end of the article.

PART2 - Draw the distribution map of the data set

figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

figure—Create a figure window
X(:,1) to represent the first column of the X array, and X(:,2) in the same way.
Run the plot function to generate the following figure, where the first column of X is the value of the abscissa, and the second column is the value of the ordinate.
figure1

PART3 - kmeans clustering of data sets

rng(1); % For reproducibility
[idx,C] = kmeans(X,3);

rng - Control the Random Number Generator
Common syntaxes are:

rng(seed)
rng(seed,generator)
s = rng

Specifies the seed for the MATLAB® random number generator. For example, rng(1) initializes the Mersenne rotation generator with a seed of 1. For related routines, please refer to "rng Controlled Random Number Generator"
kmeans——k-means clustering The
common syntaxes are:

idx = kmeans(X,k)
idx = kmeans(X,k,Name,Value)
[idx,C] = kmeans()
[idx,C,sumd] = kmeans(
)
[idx,C,sumd,D] = kmeans(___)

Here idx is a label array with a length of N×1, and C is an array composed of cluster center coordinate values, so the number of cluster groups k=3, and the space is a two-dimensional plane, so the size of C is 3×2

PART4——Determine the coordinate grid

    % Assigns each node in the grid to the closest centroid
x1 = min(X(:,1)):0.01:max(X(:,1));
x2 = min(X(:,2)):0.01:max(X(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)];  % Defines a fine grid on the plot

x1 and x2 are used to determine the coordinate range, and 0.01 is taken as the minimum distance.
Using the "Data Cursor" tool, you can see the upper and lower limits of the horizontal and vertical axes of the data more intuitively. lower limit of abscissa
Lower limit of ordinate
upper limit of abscissa
Upper limit of ordinate
x ∈ [ 1 , 6.9 ] y ∈ [ 0.1 , 2.5 ] x \in [1,6.9] \\ y \in [0.1, 2.5]x[1,6.9]y[0.1,2 . 5 ]
divided by 0.01, the lengths of x1 and x2 can be calculated:
x 1 = ( 6.9 − 1 ) / 0.01 + 1 = 591 x 2 = ( 2.5 − 0.1 ) / 0.01 + 1 = 241 x1=( 6.9-1)/0.01+1=591\\x2=(2.5-0.1)/0.01+1=241x 1=(6.91)/0.01+1=591x2 _=(2.50.1)/0.01+1=2 4 1
In the work area, you can see that the variable length is the same as the calculation result:
Please add a picture description
mshgrid builds a two-dimensional grid

PART5 - kmeans clustering grid points

idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);
    % Assigns each node in the grid to the closest centroid

MaxIter refers to the maximum number of iterations of the kmeans algorithm, where the maximum number of iterations is 1.

PART6 - drawing

figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...
    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
legend('Region 1','Region 2','Region 3','Data','Location','SouthEast');
hold off;    

gscatter——a tool for drawing scatter plots.
The common syntaxes are:

gscatter(x,y,g)
gscatter(x,y,g,clr,sym,siz)
gscatter(x,y,g,clr,sym,siz,doleg)
gscatter(x,y,g,clr,sym,siz,doleg,xnam,ynam)
gscatter(ax,)
h = gscatter(
)

It is worth noting that gscatter is not used to draw a "scatter plot" here, but to use densely colored scattered points to form a colored area. Because the region value is large enough, the scattered points are dense enough (it becomes a piece of 0m0).
idx2Region

[0,0.75,0.75;0.75,0,0.75;0.75,0.75,0] are three RGB color values, the corresponding colors are:
0,0.75,0.75
0.75,0,0.75
0.75,0.75,0

gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...
    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');

This '...' refers to (should be) the shape of the scatter point, which is the same as the output image replaced by '.'. (I suspect that the hand of the teacher who wrote the routine is shaking. It's too big. 【狗头保护】)

legend——Add a legend to the axes
Common syntaxes are:

legend
legend(label1,…,labelN)
legend(labels)
legend(subset,)
legend(target,
)

figure2

Iris Dataset

Sepal length Sepal width petal length petal width
5.10 3.50 1.40 0.20
4.90 3.00 1.40 0.20
4.70 3.20 1.30 0.20
4.60 3.10 1.50 0.20
5.00 3.60 1.40 0.20
5.40 3.90 1.70 0.40
4.60 3.40 1.40 0.30
5.00 3.40 1.50 0.20
4.40 2.90 1.40 0.20
4.90 3.10 1.50 0.10
5.40 3.70 1.50 0.20
4.80 3.40 1.60 0.20
4.80 3.00 1.40 0.10
4.30 3.00 1.10 0.10
5.80 4.00 1.20 0.20
5.70 4.40 1.50 0.40
5.40 3.90 1.30 0.40
5.10 3.50 1.40 0.30
5.70 3.80 1.70 0.30
5.10 3.80 1.50 0.30
5.40 3.40 1.70 0.20
5.10 3.70 1.50 0.40
4.60 3.60 1.00 0.20
5.10 3.30 1.70 0.50
4.80 3.40 1.90 0.20
5.00 3.00 1.60 0.20
5.00 3.40 1.60 0.40
5.20 3.50 1.50 0.20
5.20 3.40 1.40 0.20
4.70 3.20 1.60 0.20
4.80 3.10 1.60 0.20
5.40 3.40 1.50 0.40
5.20 4.10 1.50 0.10
5.50 4.20 1.40 0.20
4.90 3.10 1.50 0.20
5.00 3.20 1.20 0.20
5.50 3.50 1.30 0.20
4.90 3.60 1.40 0.10
4.40 3.00 1.30 0.20
5.10 3.40 1.50 0.20
5.00 3.50 1.30 0.30
4.50 2.30 1.30 0.30
4.40 3.20 1.30 0.20
5.00 3.50 1.60 0.60
5.10 3.80 1.90 0.40
4.80 3.00 1.40 0.30
5.10 3.80 1.60 0.20
4.60 3.20 1.40 0.20
5.30 3.70 1.50 0.20
5.00 3.30 1.40 0.20
7.00 3.20 4.70 1.40
6.40 3.20 4.50 1.50
6.90 3.10 4.90 1.50
5.50 2.30 4.00 1.30
6.50 2.80 4.60 1.50
5.70 2.80 4.50 1.30
6.30 3.30 4.70 1.60
4.90 2.40 3.30 1.00
6.60 2.90 4.60 1.30
5.20 2.70 3.90 1.40
5.00 2.00 3.50 1.00
5.90 3.00 4.20 1.50
6.00 2.20 4.00 1.00
6.10 2.90 4.70 1.40
5.60 2.90 3.60 1.30
6.70 3.10 4.40 1.40
5.60 3.00 4.50 1.50
5.80 2.70 4.10 1.00
6.20 2.20 4.50 1.50
5.60 2.50 3.90 1.10
5.90 3.20 4.80 1.80
6.10 2.80 4.00 1.30
6.30 2.50 4.90 1.50
6.10 2.80 4.70 1.20
6.40 2.90 4.30 1.30
6.60 3.00 4.40 1.40
6.80 2.80 4.80 1.40
6.70 3.00 5.00 1.70
6.00 2.90 4.50 1.50
5.70 2.60 3.50 1.00
5.50 2.40 3.80 1.10
5.50 2.40 3.70 1.00
5.80 2.70 3.90 1.20
6.00 2.70 5.10 1.60
5.40 3.00 4.50 1.50
6.00 3.40 4.50 1.60
6.70 3.10 4.70 1.50
6.30 2.30 4.40 1.30
5.60 3.00 4.10 1.30
5.50 2.50 4.00 1.30
5.50 2.60 4.40 1.20
6.10 3.00 4.60 1.40
5.80 2.60 4.00 1.20
5.00 2.30 3.30 1.00
5.60 2.70 4.20 1.30
5.70 3.00 4.20 1.20
5.70 2.90 4.20 1.30
6.20 2.90 4.30 1.30
5.10 2.50 3.00 1.10
5.70 2.80 4.10 1.30
6.30 3.30 6.00 2.50
5.80 2.70 5.10 1.90
7.10 3.00 5.90 2.10
6.30 2.90 5.60 1.80
6.50 3.00 5.80 2.20
7.60 3.00 6.60 2.10
4.90 2.50 4.50 1.70
7.30 2.90 6.30 1.80
6.70 2.50 5.80 1.80
7.20 3.60 6.10 2.50
6.50 3.20 5.10 2.00
6.40 2.70 5.30 1.90
6.80 3.00 5.50 2.10
5.70 2.50 5.00 2.00
5.80 2.80 5.10 2.40
6.40 3.20 5.30 2.30
6.50 3.00 5.50 1.80
7.70 3.80 6.70 2.20
7.70 2.60 6.90 2.30
6.00 2.20 5.00 1.50
6.90 3.20 5.70 2.30
5.60 2.80 4.90 2.00
7.70 2.80 6.70 2.00
6.30 2.70 4.90 1.80
6.70 3.30 5.70 2.10
7.20 3.20 6.00 1.80
6.20 2.80 4.80 1.80
6.10 3.00 4.90 1.80
6.40 2.80 5.60 2.10
7.20 3.00 5.80 1.60
7.40 2.80 6.10 1.90
7.90 3.80 6.40 2.00
6.40 2.80 5.60 2.20
6.30 2.80 5.10 1.50
6.10 2.60 5.60 1.40
7.70 3.00 6.10 2.30
6.30 3.40 5.60 2.40
6.40 3.10 5.50 1.80
6.00 3.00 4.80 1.80
6.90 3.10 5.40 2.10
6.70 3.10 5.60 2.40
6.90 3.10 5.10 2.30
5.80 2.70 5.10 1.90
6.80 3.20 5.90 2.30
6.70 3.30 5.70 2.50
6.70 3.00 5.20 2.30
6.30 2.50 5.00 1.90
6.50 3.00 5.20 2.00
6.20 3.40 5.40 2.30
5.90 3.00 5.10 1.80

Guess you like

Origin blog.csdn.net/weixin_45074807/article/details/123399672