Kmeans clustering, 8 points are divided into three categories

  • The following 8 points are known:

X 1 : ( 2 , 10 ) , X 2 : ( 2 , 5 ) , X 3 : ( 8 , 4 ) , X 4 : ( 5 , 8 ) , X 5 : ( 7 , 5 ) , X 6 : ( 6 , 4 ) , X 7 : ( 1 , 2 ) , X 8 : ( 4 , 9 ) X_1:(2,10),X_2:(2,5),X_3:(8,4),X_4:(5,8),X_5:(7,5),X_6:(6,4),X_7:(1,2),X_8:(4,9) X1:(2,10),X2:(2,5),X3:(8,4),X4:(5,8),X5:(7,5)X6:(6,4),X7:(1,2)X8:(4,9)

The initial point is X 1 , X 4 , X 7 X_1,X_4,X_7X1,X4,X7, using k-meansa clustering algorithm to cluster them into three categories

untie:

First give the distance formula

Distance formula:
Suppose the coordinates of point A are ( X 1 , Y 1 ) (X_1,Y_1)(X1,Y1) , the coordinates of point B are (X 2 , Y 2 X_2,Y_2X2,Y2
Distance= ( X 1 − X 2 ) 2 + ( Y 1 − Y 2 ) 2 \sqrt{(X_1-X_2)^2+(Y_1-Y_2)^2} (X1X2)2+(Y1Y2)2 Note: This is: Euclidean Distance (Euclidean Distance)
Reference Mathematics in Machine Learning - Distance Definition (1): Euclidean Distance (Euclidean Distance)

  1. First calculate each point to X 1 , X 4 , X 7 X_1,X_4,X_7X1,X4,X7The distance is as follows (such as X 1 X_1X1to X 1 X_1X1The distance is 0, X 1 X_1X1to X 2 X_2X2The distance is ( 2 − 2 ) 2 + ( 10 − 5 ) 2 = 5 \sqrt{(2-2)^2+(10-5)^2}=5(22)2+(105)2 =5)
X 1 ( 2 , 10 ) X_1(2,10) X1(2,10) X 4 ( 5 , 8 ) X_4(5,8) X4(5,8) X 7 ( 1 , 2 ) X_7(1,2) X7(1,2)
X 1 ( 2 , 10 ) X_1(2,10)X1(2,10) 0 3.6 8.1
X 2 ( 2 , 5 ) X_2(2,5) X2(2,5) 5.0 4.2 3.2
X 3 ( 8 , 4 ) X_3(8,4) X3(8,4) 8.5 5.0 7.3
X 4 ( 5 , 8 ) X_4(5,8)X4(5,8) 3.6 0 7.2
X 5 ( 7 , 5 ) X_5(7,5) X5(7,5) 7.1 3.6 6.7
X 6 ( 6 , 4 ) X_6(6,4) X6(6,4) 7.2 4.1 5.4
X 7 ( 1 , 2 ) X_7(1,2) X7(1,2) 8.1 7.2 0
X 8 ( 4 , 9 ) X_8(4,9) X8(4,9) 2.2 1.4 7.6

Note: Take X 8 X_8X8For example: its with X 4 X_4X4Recently, so the two of them are classified into one category, and the distance here is kept to one decimal place

中心点坐标=(x坐标的均值,y坐标的均值

i.e. X 1 X_1X1For a class, the center point is its coordinates (2,10)

X 4 , X 3 , X 5 , X 6 , X 8 X_4,X_3,X_5,X_6,X_8 X4,X3,X5,X6,X8is a class,
center point coordinates ( 8 + 5 + 7 + 6 + 4 5 , 4 + 8 + 5 + 4 + 9 5 \displaystyle\frac{8+5+7+6+4}{5},\displaystyle \frac{4+8+5+4+9}{5}58+5+7+6+4,54+8+5+4+9) is equal to (6,6)

X 7 , X 2 X_7,X_2 X7,X2For a kind of empathy center point is (1.5,3.5)

  1. The second step is to calculate the distance from the 8 points to the three center points
(2,10) (6,6) (1.5,3.5)
X 1 X_1 X1 0 5.7 6.5
X 2 X_2 X2 5.0 4.1 1.6
X 3 X_3 X3 8.5 2.8 6.5
X 4 X_4 X4 3.6 2.2 5.7
X 5 X_5 X5 7.1 1.4 5.7
X 6 X_6 X6 7.2 2.0 4.5
X 7 X_7 X7 8.1 6.4 1.6
X 8 X_8 X8 2.2 3.6 6.0

Similarly, X1, X8 to (2,10) are recently classified into one category, and the center point is (3,9.5)

X3, X4, X5, X6 to (6,6) are recently classified into one category, and the center point is (6.25.4.5)

X7, X2 to (1.5,3.5) are classified into one category recently, and the center point is (1.5.3.5)

3. The third step, repeat the last action, calculate the distance from the 8 points to the three center points

( 3 , 9.5 ) (3,9.5) (3,9.5) ( 6.25.4.5 ) (6.25.4.5) (6.25.4.5) ( 1.5.3.5 ) (1.5.3.5) (1.5.3.5)
X 1 X_1X1 1.1 7.0 6.5
X 2 X_2X2 4.6 4.3 1.6
X 3 X_3X3 7.4 1.8 6.5
X 4 X_4X4 2.5 3.7 5.7
X 5 X_5X5 6.0 0.90 5.7
X 6 X_6X6 6.3 0.56 4.5
X 7 X_7X7 7.8 5.8 1.6
X 8 X_8X8 1.1 5.0 6.0

X 1 , X 4 , X 8 X_1,X_4,X_8 X1,X4,X8Classified as a class center point is ( 3.66 , 9 ) (3.66,9)(3.66,9)

X 3 , X 5 , X 6 X_3,X_5,X_6 X3,X5,X6Classified as a class center point is ( 7 , 4 ) (7,4)(7,4)

X 2 , X 7 X_2,X_7 X2,X7Classify the center point as (1.5.3.5) (1.5.3.5)(1.5.3.5)

3. The fourth step, repeat the last action, calculate the distance from the 8 points to the three center points

( 3.66 , 9 ) (3.66,9) (3.66,9) ( 7 , 4 ) (7,4) (7,4) ( 1.5.3.5 ) (1.5.3.5) (1.5.3.5)
X 1 X_1 X1 1.9 8.1 6.5
X 2 X_2 X2 4.3 5.5 1.6
X 3 X_3 X3 6.6 0.60 6.5
X 4 X_4 X4 1.7 4.7 5.7
X 5 X_5 X5 5.2 1.1 5.7
X 6 X_6 X6 5.5 1.4 4.5
X 7 X_7 X7 7.5 6.7 1.6
X 8 X_8 X8 0.34 6.0 6.0

X 1 , X 4 , X 8 X_1,X_4,X_8 X1,X4,X8grouped together, X 3 , X 5 , X 6 X_3,X_5,X_6X3,X5,X6归为一类 , X 2 , X 7 X_2,X_7 X2,X7归为一类

此时分类结果遇上一步分类一样,分类结束,上面就是最后的分类结果

说第“几”步其实不太对,这就是一个不停更新中心点的过程。。。。

最后附上代码
定义一个distance的函数,这里就是计算每个点到坐标 ( x , y ) (x,y) (x,y)的距离

from sympy import *
def distance(x,y):
    data=[[2,10],[2,5],[8,4],[5,8],[7,5],[6,4],[1,2],[4,9]]
    for i in range(len(data)):
       row=np.array([(N(sqrt( (data[i][0]-x)**2+(data[i][1]-y)**2),2)) for i in range(len(data))])
       return row
# pd.DataFrame(distance(2,10),distance(2,10),distance(1,2))
list=[distance(3.66,9),distance(7.4,4),distance(1.5,3.5)]
Y=pd.DataFrame(list).T
Y.columns=["$X_1$", "$X_2$", "$X_3$"]
Y.index = ["$X_1$", "$X_2$", "$X_3$", "$X_4$",
            "$X_5$","$X_6$", "$X_7$","$X_8$"]

写的时候发现一篇文章k-means算法例题应用也是类似的题目

Guess you like

Origin blog.csdn.net/qq_54423921/article/details/131075831
Recommended