Mahout: Fuzzy k-means clustering

As the name says, the fuzzy k-means clustering algorithm does a fuzzy form of k-means clustering. Instead of the exclusive clustering in k-means, fuzzy k-means tries to generate overlapping clusters from the data set. In the academic community, it’s also known as the fuzzy c-means algorithm. You can think of it as an extension of k-means.

K-means tries to find the hard clusters (where each point belongs to one cluster) whereas fuzzy k-means discovers the soft clusters. In a soft cluster, any point can belong to more than one cluster with a certain affinity value towards each. This affinity is proportional to the distance from the point to the centroid of the cluster. Like k-means, fuzzy k-means works on those objects that can be represented in n-dimensional vector space and it has a distance measure defined.

mahout fkmeans 
-i mahout/reuters-vectors/tfidf-vectors/ 
-c mahout/reuters-fkmeans-centroids 
-o mahout/reuters-fkmeans-clusters 
-cd 1.0 -k 21 -m 2 -ow -x 10 
-dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure

Fuzzy k-means has a parameter, m, called the fuzziness factor. Like k-means, fuzzy k-means loops over the data set but instead of assigning vectors to the nearest centroids,it calculates the degree of association of the point to each of the clusters.

Suppose for a vector, V, that d1, d2, ... dk are the distances to each of the k cluster centroids. The degree of association (u1) of vector (V) to the first cluster (C1) is calculated as



If m increases, the fuzziness of the algorithm increases, and you’ll begin to see more and
more overlap.The fuzzy k-means algorithm also converges better and faster than the standard k-
means algorithm.

猜你喜欢

转载自ylzhj02.iteye.com/blog/2078695
今日推荐