机器学习 k-近邻算法

1、使用python导入数据

from numpy import *
def createDataSet():
    group=array([[1.1,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels

kNN分类算法:

from numpy import *
import operator
def classify0(inX,dataSet,labels,k):
    dataSetSize=dataSet.shape[0]    #shape[0]表示dataSet的行数
    diffMat=tile(inX,(dataSetSize,1))-dataSet
    sqDiffMat=diffMat**2
    sqDistances=sqDiffMat.sum(axis=1)
    distances=sqDistances**0.5
    sortedDistIndicies=distances.argsort()
    classCount={}
    for i in range(k):
        voteIlabel=labels[sortedDistIndicies[i]]
        classCount[voteIlabel]=classCount.get(voteIlabel,0)+1
    sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]
distances是1*4的矩阵,分别表示待分类的点与所有已知点的距离;
sortedDistIndicies是distances从小到大的索引值;
voteIlabel相当于临时变量,用来取得标签值;
classCount[voteIlabel]=classCount.get(voteIlabel,0)+1    如果在字典classCount中找到key=voteIlabel的value,就加1,找不到的话classCount.get(voteIlabel,0)返回0然后加1
sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)  先把字典classCount变成列表,再按照第二维降序排列,返回的仍是列表

执行算法:

import kNN
from classify_kNN import *
g,l=kNN.createDataSet()
result=classify0([0,0],g,l,3)
print(result)

输出:

B


items():将字典中的项按照列表返回,无序:

get():返回字典对应key的value值,不存在key时返回第二个参数:

dic={'a':1,'b':2,'c':3}
print(dic.items())
print(dic.get('c','no'))
输出:
dict_items([('b', 2), ('c', 3), ('a', 1)])
3

shape:返回矩阵的维数;

from numpy import *
c=array([[1,1],[2,3,],[5,6]])
print(c)
print(c.shape)
print(c.shape[0])
print(c.shape[1])
输出:
[[1 1]
 [2 3]
 [5 6]]
(3, 2)
3
2

 operator.itemgetter():返回对象特定维的数据,结合sorted()方法使用:

import operator
students=[['刚田武',20,'gangtw'],['朱二娃',25,'zhuerw'],['咪咪two',30,'miomitwo']]
print(sorted(students,key=operator.itemgetter(1),reverse=True))
输出:
[['咪咪two', 30, 'miomitwo'], ['朱二娃', 25, 'zhuerw'], ['刚田武', 20, 'gangtw']]

argsort():返回数组值从小到大的索引值

猜你喜欢

转载自www.cnblogs.com/zhhy236400/p/9826347.html