常用无监督学习对比

Gaussian Mixture Model Clustering

Advantage

  • Soft-clustering(sample membership of multiple clusters)
  • Cluster shape flexibility

Disadvantages

  • Sensitive to initialisation values
  • Possible to converge to a local optimum
  • Slow convergence rate

Implementation

from sklearn import datasets, mixture
# load dataset
X = datasets.load_iris().data[:10]

# Specify the parameters for the clustering
gmm = mixture.GaussianMixture(n_components=3)
gmm.fix(X)
clustering = gmm.predict(X)

Cluster analysis process

  1. Data
  2. Feature selection/extraction
  3. Clustering algorithm selection & tuning
  4. Clustering validation

Categories

  • External indices: These are the scoring methods that we use if our data was originally labeled.
  • Internal indices: The data we have is not labelled specifically with unsupervised learning (most of the time). And these measure the fit between the data and structure using only the data.
  • Relative indices: These indicate which of two clustering structures is better in some sense. Basically, all internal indices can serve as relative indices.

Most validation indices are defined by compacting compactness and separability

  • Compactness: is basically a measure of how close the elements of a cluster are to each other.
  • Separability: is how far or distinct clusters are from each other
  • The general assumption: clustering methods are generally expected to produce clusters where the elements of the cluster are the most similar to each other and the clusters are the most distinct from the each other.
index range available in sklearn
Adjusted Rand Score [-1,1] Y
Fawlks and Mallows [0,1] Y
NMI measure [0,1] Y
Jaccard [0,1] Y
F-measure [0,1] Y
Purity [0,1] N
  1. Results interpretation
  2. Knowledge

猜你喜欢

转载自blog.csdn.net/weixin_42303053/article/details/85764498