2.3.2 inter-class (between classes) Measurement distance
. A nearest neighbor method : indicate that two classes, the minimum distance as the distance between the two types.
For example:
. Method B farthest distance : represents two classes, the maximum distance as the distance between the two types.
For example:
C intermediate distance method. : The class between the first and the second p q Class class into a new class of L, k classes and another class of this new class distance is the midpoint of the original becomes the new two pq distance calculation points (sample points without regard to how many there are in each class).
center of gravity d distance method. : intermediate distance above method considering the number of samples in each class, i.e. the distance between the center of gravity of the two classes.
For example:
. E Average distance method : the average distance between any two points.
For example:
F intra-class and the sum of squares. : Suitable slug distribution.
2.3.3 Clustering criterion function :
four basic directions clustering application :
A). Reduce data:
Many times, when N is a large amount of data, make the data processing very laborious. Thus use cluster analysis groups the original data into clusters m (n << N) can be determined, each class can be treated as an entity, from this perspective, the data is compressed.
. B) Hypothesis generation:
In order to derive some hypotheses nature of the data, the data set clustering analysis. As used herein, the clustering established clustering method, and then verify these hypotheses with other data sets.
. c) Hypothesis test:
Using cluster analysis to verify the validity of the specified hypothesis.
For example: Consider the hypothesis that "big companies to invest abroad." To verify the validity of this hypothesis, the need for large companies and companies representative by size, overseas activity, ability to successfully complete the project cluster analysis.
d). packet-based prediction:
the existing data clustering analysis, characteristic patterns are formed, and expressed by clustering feature, then, for an unknown pattern. Foregoing can be used to determine what type cluster.
For example: Consider chanting and Industry virus-infected patient data set, first by cluster analysis to classify them, and then determine his suitability for the new cluster of patients, to determine their condition.
Distance criterion function :
determining a classification result of the quality criteria: Class within a small distance, the large distance between the classes.
. A) based distance criteria :
In order to achieve higher classification results: that the minimum value is taken Jw, Jw is tend min, this method is also called squared error criterion.
b) inter-class distance criterion :
C). Based on the criterion function of the distance between the distance-based categories :
target: to make the distance small class clustering result, a large distance between the classes, the classification result is a higher degree of differentiation. For this purpose can be constructed at the same time reflect the criterion function of the distance between the distance and class within the class.