Spectral clustering spectral clustering

Basic knowledge: 

degree matrix; similarity matrix, and Adjacency matrix;

No directed weighted graph model  G = < V , E > G = <V, E>, the right side of each weight W I J is the similarity between two vertices, which can define the similarity matrix W is , can be defined in addition matrix D and the adjacency matrix A , thereby Laplacian matrix  L = D - A;

Distance metric adjacency matrix
  adjacency matrix similar to FIG reflected to some extent between the various nodes, a common non-zero elements of the adjacency matrix 1 i.e. , spectral clustering of the adjacency matrix is calculated using KNN. More specifically, through each node xi, kk find its closest points based on the similarity (or distance) matrix, constituting the neighborhood NiNi xixi, and then press one of the following rules configured adjacency matrix.


Methodology: 

1. similarity matrix S; sample points by a distance metric similarity matrix S obtained adjacency matrix W. 

Construction adjacency matrix W method of W there are three types. [epsilon] [epsilon] Neighbor, K neighbor method and the method fully connected.

2. Laplacians matrix,

Laplacian matrix L = D - W is

 Input: sample set = D ( X . 1 , X 2 , . . . , X n- ) (X1, X2, ..., Xn), to generate a matrix of similarity, the dimensionality reduction dimension K . 1 K1, clustering , the dimension of the cluster K 2 K2

    Output: clustering in C ( C . 1 , C 2 , . . . C K 2 ) C (C1, C2, ... CK2). 

    1) Construction of a sample according to a similar manner to generate a matrix of similarity matrix S input

    2) Construction of adjacency matrix based on the similarity matrix W S, D matrix construct

    3) calculate the Laplacian matrix L

    4) Construction of the Laplacian matrix normalized after D - . 1 / 2 L D - . 1 / 2 D-. 1 /-2LD. 1/2

    5) Calculation D - . 1 / 2 L D - . 1 / 2 D-. 1 /-2LD. 1/2 the minimum K . 1 K1 characteristic value corresponding to each feature vector F F

    6) corresponding to the respective feature vector f matrix consisting of rows f the standardization, the final composition of n- × K . 1 n-dimensional feature matrix K1 × F

    7) in each row F as a K . 1 sample dimension k1, total n samples, cluster by cluster input method, the clustering of dimension K 2 K2.

    8) obtained clustering in C ( C . 1 , C 2 , . . . C K 2 ) C (C1, C2, ... CK2).    

The main advantage of spectral clustering algorithm are:

    1) need only spectral clustering similarity matrix between the data, so it is effective for sparse data clustering. This traditional clustering algorithms such as K-Means hard to do

    2) the use of a reduced dimension, complexity and therefore when the high-dimensional data clustering is better than the traditional clustering algorithm.

    The main disadvantage of spectral clustering algorithm are:

    1) If the final cluster dimension is very high, due to the magnitude of dimensionality reduction is not enough, spectral clustering running speed and the final clustering results are not good.

    2) clustering result depends on the similarity matrix, the effect of different final cluster similarity matrix obtained may be very different.

Guess you like

Origin www.cnblogs.com/dulun/p/12070171.html