Data Mining --MATLAB k-means clustering algorithm implemented

There is a very important algorithm called clustering algorithm in data compression and data classification.
It represents a set of cluster points of the minimum distance at a certain norm.
Remove the mean for each class can be considered as representative of this category.
Take two norm, to measure the distance of two vectors.
Here Insert Picture Description
Yellow font: classification index set;
blue font: each type of center vector;
green font: measure the end result, to make it as small as possible;
specific algorithm shows:
Here Insert Picture Description

  1. Initialization classification sets, computing the initial center vector classification set, the Q value is calculated in the initial state, the number of iterations t is 1;
  2. For each sample vector A I find its nearest center vector and, if it finds the center vector m p , then this will sample vectors A I go into the first category to p;
  3. Calculating a new value of Q;
  4. Analyzing the gap between the old and new values of the Q value Q satisfies Tol margin of error, if not, repeat steps 1 to 4; otherwise, the operation is ended.
    Note: k-mans algorithm can ensure faster convergence, to ensure that the result is not the global optimum.
function cla=kmeans(A,k)
%K-means聚类算法
%A为样本(m*n),将其列向量分为k类
%M为中心向量矩阵,k*m型
%初始化中心向量矩阵
M=A(:,1:k);
[m n]=size(A);
%初始化分类集
for i=1:k-1
    cla{i}=[i];
    M(:,i)=A(:,i);
end
cla{k}=[k:n];
M(:,k)=mean(A(:,k:n),2);
Q0=0;
for i=k:n
    Q0=Q0+norm(A(:,i)-M(:,k),2)^2;
end
while true
    Q=0;
    for i=1:n
        [min_value,min_cla]=min(sum((M-A(:,i)).^2));
        %记录下ai属于第min_cla类,且距离值为min_value,用于后续计算Q
        cla{min_cla}=[cla{min_cla},i] %把当前的样本ai记录入对应类
        Q=Q+min_value;
    end
        if abs(Q-Q0)<0.01
            return;
        else 
            Q0=Q;
        end
     %计算新一轮的中心向量
     for i=1:k
         M(:,i)=zeros(m,1);
        for v=cla{i}
            M(:,i)=M(:,i)+A(:,v);
        end
        M(:,i)=M(:,i)/length(cla{i});
     end
     %置空k个分类指标集
    for j=1:k
        cla{j}=[];
    end
end

Considering the sample matrix will be great, so there is no use to define and call the subroutine. Feeling that may be very slow.
Recommended test matrix is shown below this:
Here Insert Picture Description
Note: Each column represents a point, it can be seen so that it is clear: if it is divided into three categories, then it should be pre 3, intermediate 3, as a last optimal points 4 method, of course, may be intermediate the right + left + 4 = three short, it should be divided into a continuous type.
Look at the output of the above function call:
Here Insert Picture Description
Description: The classification is OK.
Further disrupt the order, test it again:
Here Insert Picture Description
Here matrix AA just swap the columns of the matrix A bit sequence.
Results:
Here Insert Picture Description
a classification result of the matrix A is left to the right + 3 + intermediate
classification results of the matrix AA + left + right side of intermediate 4; Description Classification fairly successful.
The above said position is for a set of points in the figure below:
Here Insert Picture Description

Published 109 original articles · won praise 30 · views 10000 +

Guess you like

Origin blog.csdn.net/qq_43448491/article/details/103055462