Fancy Sao operation clustering-method summary detailed explanation with code

Fancy Sao operation clustering-method summary detailed explanation with code

fcm clustering

Fuzzy clustering is one of the important research branches in many fields such as knowledge discovery and pattern recognition. With
the expansion of the research scope, both scientific research and practical applications have put forward higher requirements on the results of clustering from many aspects. Fuzzy
C-means clustering (FCM) is currently a popular clustering method. This method uses the
concept of determining the geometric closeness of data points in Euclidean space , it assigns these data to different clusters, and then determines the distance between these clusters.
Fuzzy C-means clustering algorithm has laid a foundation for other fuzzy clustering analysis methods in theory and application, and it is also the most
widely used. However, in essence, the FCM algorithm is a local search optimization algorithm. If the initial value is not selected properly, it will
converge to a local minimum. Therefore, this shortcoming of the FCM algorithm limits people's use of it.

SAGA-based clustering

We combine simulated annealing algorithm with genetic algorithm (SAGA) for cluster analysis. Because simulated annealing algorithm and genetic
algorithm can complement each other, it effectively overcomes the premature phenomenon of traditional genetic algorithm, and at the same time according to the
specific clustering problem. The genetic coding method and fitness function are designed to make the algorithm more effective and converge to the global optimal solution more quickly.
Insert picture description here

Fast clustering

基本思想是,样本容量较大时,选择一批凝聚点
或给出一个初始的分类,让样品按某种原则向凝聚点
凝聚,对凝聚点进行不断的修改或迭代,直至分类比
较合理或迭代稳定为止。
类的个数k可以事先指定,也可以在聚类过程中
确定。选择初始凝聚点(或给出初始分类)的一种简
单方法是采用随机抽选(或随机分割)样品的方法。
动态聚类法有许多种方法,本节中,只讨论一种
比较流行的动态聚类法——k均值法。k均值法是由麦
奎因(MacQueen,1967)提出并命名的一种算法。

Hierarchical clustering

Hierarchical cluster method (hierarchical cluster method) is translated "hierarchical cluster method". A method of cluster analysis. The method is to start with each sample as a category, and then group the closest samples (ie the group with the smallest distance) into small categories first, and then merge the aggregated small categories according to the distance between the categories, and continue to continue Go on, and finally aggregate all sub-categories into one big category.
Its steps are as follows:
Taking cluster analysis of n samples as an example, the steps of systematic clustering are as follows:
define a distance in the space with the number of variables or indicators as the dimension;
calculate the distance between n samples The distance between
each sample is classified into one category, and the two closest categories are merged into a new category according to the calculated distance between the samples; then the distance between the
new category and other categories is calculated, and the same is combined according to the calculated distance The two closest categories are a new category; the
above process is looped until the number of categories is 1;
the cluster diagram of each stage is drawn and the number of categories is determined.
For variable cluster analysis, you only need to replace the distance with the similarity coefficient, and then cluster the variables with larger similarity coefficients.

Two-step clustering

The two-step clustering algorithm is a clustering algorithm, which is an improved version of the BIRCH hierarchical clustering algorithm. It can be applied to clustering of mixed-attribute data sets, and a mechanism for automatically determining the optimal number of clusters is added, making the method more practical. On the basis of studying literature [1] and "IBM SPSS Modeler 15 Algorithms Guide", this paper incorporates my own understanding and describes the process and details of the two-step clustering algorithm in more detail.
The two-step clustering algorithm is divided into two stages as the name suggests:

1)预聚类(pre-clustering)阶段。采用了BIRCH算法中CF树生长的思想,逐个读取数据集中数据点,
在生成CF树的同时,预先聚类密集区域的数据点,形成诸多的小的子簇(sub-cluster)。

2)聚类(clustering)阶段。以预聚类阶段的结果——子簇为对象,利用凝聚法
(agglomerative hierarchical clustering method),逐个地合并子簇,直到期望的簇数量。

topic

A reasonable cluster analysis is performed on 400 randomly generated points on a two-dimensional plane, and finally a suitable k value and a set of points under different clusters are obtained. Here we use MATLAB to generate 400 random points, the command is as follows

X = rand(400,2);

Next, we will use the five clustering methods introduced above to reasonably cluster the points in our data set, and the results obtained are shown below, with codes attached.

Results and code

Insert picture description here
Insert picture description here

                                               快速聚类的结果

Insert picture description here

                                                系统聚类结果

Insert picture description here

Insert picture description here

                                            两步聚类结果

Insert picture description here

                                      基于saga聚类结果图

Insert picture description here

                                             普通fcm聚类的结果
% saga主函数
% 其中部分使用Sheffield工具箱的函数
clear
clc
x = xlsread('X.xlsx');
m = size(x,2);
lb = min(x);
ub = max(x);
% 初始化
options = [3,20,1e-6];
cn = 4;
% sa算法的参数
q = 0.8;            % 冷却系数
t0 = 100;           % 初始温度
tend = 1;           % 终止温度
% ga参数
sizepop = 10;
maxgen = 10;
nvar = m*cn;
preci = 10;
ggap = 0.9;
pc = 0.7;
pm = 0.01;
trace = zeros(nvar+1,maxgen);
field = [rep([preci],[1,nvar]);rep([lb;ub],[1,cn]);rep([1;0;1;1],[1,nvar])];
chrom = crtbp(sizepop,nvar*preci);
v = bs2rv(chrom,field);
objv = ObjFun(x,cn,v,options);
t = t0;
while t > tend
    gen = 0;
    while gen < maxgen
        fitnv = ranking(objv);
        selch = select('sus',chrom,fitnv,ggap);
        selch = recombin('xovsp',selch,pc);
        selch = mut(selch,pm);
        v = bs2rv(selch,field);
        objvsel = ObjFun(x,cn,v,options);
        [newchrom newobjv] = reins(chrom,selch,1,1,objv,objvsel);
        v = bs2rv(newchrom,field);
        % 是否替换旧个体
        for i = 1:sizepop
            if objv(i) > newobjv(i)
                objv(i) = newobjv(i);
                chrom(i,:) = newchrom(i,:);
            else
                p = rand;
                if p <= exp((newobjv(i) - objv(i))/t);
                    objv(i) = newobjv(i);
                    chrom(i,:) = newchrom(i,:);
                end
            end
        end
        gen = gen + 1;
        [trace(end,gen),index] = min(objv);
        trace(1:nvar,gen) = v(index,:);
        fprintf(1,' %d',gen);
    end
    t = t*q;
    fprintf(1,'\n温度:%1.3f\n',t);
end
[newobjv,center,u] = ObjFun(x,cn,[trace(1:nvar,end)]',options);
% 查看聚类结果
jb = newobjv;
u = u{
    
    1};
center = center{
    
    1};
figure
plot(x(:,1),x(:,2),'o')
hold on 
maxu = max(u);
index1 = find(u(1,:) == maxu);
index2 = find(u(2,:) == maxu);
index3 = find(u(3,:) == maxu);
% 在前三类样本数据中分别画上不同记号,不加记号的就是第四类
line(x(index1,1),x(index1,2),'linestyle','none','marker','*','color','g');
line(x(index2,1),x(index2,2),'linestyle','none','marker','*','color','r');
line(x(index3,1),x(index3,2),'linestyle','none','marker','*','color','b');
% 画出聚类中心
plot(center(:,1),center(:,2),'v')
hold off

function [unew,center,objfcn] = iteratefcm(x,u,cluster_n,b)
% 开始迭代计算
mf = u.^b;
center = mf*x./((ones(size(x,2),1)*sum(mf'))');   % 更新后的聚类中心
dist = distfcm(center,x);
objfcn = sum(sum((dist.^2).*mf));               % 目标函数值
% 计算新的u
temp = dist.^(-2/(b-1));
unew = temp./(ones(cluster_n,1)*sum(temp));

function u = initfcm(x,cluster_n,center,b)
% 初始化相似分类矩阵
dist = distfcm(center,x);
% 开始计算新的u
temp = dist.^(-2/(b-1));
u = temp./(ones(cluster_n,1)*sum(temp));

function [obj,center,u] = fcmfun(x,cluster_n,center,options)
% fcm主函数
% X 样本数据
% cluster_n 聚类数
xn = size(x,1);
inn = size(X,2);
b = options(1);
maxiter = options(2);
minimpro = options(3);
objfcn = zeros(maxiter,1);
u = initfcm(x,cluster_n,center,b);
% 开始循环
for i = 1:maxiter
    [u,center,objfcn(i)] = iteratefcm(x,u,cluster_n,b);
    % 校验终止条件
    if i > 1
        if abs(objfcn(i) - objfcn(i-1)) < minimpro
            break;
        end
    end
end
itern = i;
objfcn(itern+1:maxiter) = [];
obj = objfcn(end);

% 使用fcm聚类
clear
clc
X = xlsread('X.xlsx');
figure
plot(X(:,1),X(:,2),'O')
hold on
% 参数设置
options = [3 20 1e-6 0];
% 聚类数
cn = 4;
[center,U,obj_fcn] = fcm(X,cn,options);
jb = obj_fcn(end);
maxU = max(U);
index1 = find(U(1,:) == maxU);
index2 = find(U(2,:) == maxU);
index3 = find(U(3,:) == maxU);
% 对前三类样本数据标号,没标号的就是第四类
line(X(index1,1),X(index1,2),'linestyle','none','marker','*','color','g');
line(X(index2,1),X(index2,2),'linestyle','none','marker','*','color','r');
line(X(index3,1),X(index3,2),'linestyle','none','marker','*','color','b');
% 画出聚类中心
plot(center(:,1),center(:,2),'v')
hold off

Guess you like

Origin blog.csdn.net/wlfyok/article/details/108569815