Unsupervised learning - used by KMeans

When the data we obtained has no labels, we still hope to find the characteristics of these data through algorithm learning. At this time, KMeans (k-means clustering) is used to cluster the data points into K clusters

Document address:

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

 class sklearn.cluster.KMeans(n_clusters=8init=’k-means++’n_init=10max_iter=300tol=0.0001precompute_distances=’auto’verbose=0random_state=Nonecopy_x=Truen_jobs=1algorithm=’auto’)

Among all the above parameters, there are three parameters that need to be paid attention to

n_clusters=8 The number of clusters (the number of clusters), the default is 8. But we generally need to adjust this

n_init=10 Number of initializations. Different initialization conditions and different positions of the clustering centers determine the results of the algorithm clustering. Initialize 10 times and get a set of 10 clusters. From there, you can select clusters as you see fit. thinking data

      When it is difficult to allocate, you can increase this value.

max_iter=300 The maximum number of algorithm iterations. Each iteration will re-move the position of the cluster center (cluster) and the distribution point. In most cases, the algorithm usually stops before reaching the maximum value.

       Therefore, in most cases, no adjustment is required, and the default is fine.

 

1  from sklearn.cluster import KMeans
 2  import numpy as np
 3 X = np.array([[1, 2], [1, 4], [1 , 0],
 4                [4, 2], [4, 4] , [4 , 0]])
 5 kmeans = KMeans(n_clusters=2, random_state= 0).fit(X)
 6  kmeans.labels_ #return the label of each point
  array([0, 0, 0, 1, 1, 1], dtype=int32)

8 kmeans.predict([[0, 0], [4, 4]])
  array([0, 1], dtype=int32)
9 kmeans.cluster_centers_ #Return to the center of the cluster
   array([[ 1.,  2.],
         [ 4.,  2.]])

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324623902&siteId=291194637