Data Mining - Clustering

In this section:

0: common method of data mining

1: What is the clustering ---- Clustering is unsupervised learning

2: Type of polymerization and the difference flag predefined categories ---

3: What are the effects of clustering results --- dimension, codes of conduct, the distance

4: Classification Cluster Analysis - The value of x or feature

5: General Procedure Cluster Analysis

6: Case Cluster Analysis

7: Principle clustering - distance and similarity was divided into clusters

0: common method of data mining

First, what is the clustering

  • Clustering also called cluster analysis (also referred to in some applications data division), means the sample into different groups so that differences in the same group of samples as small as possible, but differences in the group of samples as much as possible Big.
  • Different groups obtained by the clustering called cluster (Cluster)
  • In early childhood, through the continuous improvement of people's subconscious clustering mode only learn how to distinguish between cats and dogs, animal and plant

" Like attracts like, people in groups ." Classification of the transaction, the transaction is the starting point of awareness is also an important means of people's understanding of the world.

Unsupervised learning, also known as cluster analysis, unsupervised learning from many research areas, many driven by the application. Such as:

In the complex network analysis , people want to find societies are inherently closely linked 
in image analysis , people want the image is divided into regions with similar properties 
in text processing , people want to find a subset of the text with the same subject 
.... . 
these cases can be classified under the appropriate conditions for the cluster analysis.

Second, the difference between clustering and classification

Different learning and unsupervised classification, the type of mark is not defined in advance.

- clustering of said class is not given in advance, but is divided according to the similarity and the distance data

- the number of clusters and structures are not presuppose

The use of cluster analysis:

  • It can be used for separate data analysis tools
  • Other methods can be used as a pretreatment means

Clustering methods purpose is to look for data

[1] packet potential natural structures a structure of "natural" grouping

[2] interested in the relationship between relationsship

Third, the impact of which clustering results

1: dimensions (in general we need to standardize the data)

2: Code of Conduct clustering

3: Measure the distance

 

Fourth, the classification of cluster analysis

Samples are classified according to X, or classified according to the n eigenvalues ​​of X

Five general steps cluster analysis

Six cases of cluster analysis

Seven clusters of principle

Principle: distance, similarity

Distance of distinct vision:

  • Euclidean distance
  • Manhattan far cry

Similarity:

  • Two yuan similarity
  • Vector similarity

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/hero799/p/12080173.html