1、使用python导入数据
from numpy import * def createDataSet(): group=array([[1.1,1.1],[1.0,1.0],[0,0],[0,0.1]]) labels=['A','A','B','B'] return group,labels
kNN分类算法:
from numpy import * import operator def classify0(inX,dataSet,labels,k): dataSetSize=dataSet.shape[0] #shape[0]表示dataSet的行数 diffMat=tile(inX,(dataSetSize,1))-dataSet sqDiffMat=diffMat**2 sqDistances=sqDiffMat.sum(axis=1) distances=sqDistances**0.5 sortedDistIndicies=distances.argsort() classCount={} for i in range(k): voteIlabel=labels[sortedDistIndicies[i]] classCount[voteIlabel]=classCount.get(voteIlabel,0)+1 sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True) return sortedClassCount[0][0]
distances是1*4的矩阵,分别表示待分类的点与所有已知点的距离;
sortedDistIndicies是distances从小到大的索引值;
voteIlabel相当于临时变量,用来取得标签值;
classCount[voteIlabel]=classCount.get(voteIlabel,0)+1 如果在字典classCount中找到key=voteIlabel的value,就加1,找不到的话classCount.get(voteIlabel,0)返回0然后加1
sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True) 先把字典classCount变成列表,再按照第二维降序排列,返回的仍是列表
执行算法:
import kNN from classify_kNN import * g,l=kNN.createDataSet() result=classify0([0,0],g,l,3) print(result)
输出:
B
items():将字典中的项按照列表返回,无序:
get():返回字典对应key的value值,不存在key时返回第二个参数:
dic={'a':1,'b':2,'c':3} print(dic.items()) print(dic.get('c','no')) 输出: dict_items([('b', 2), ('c', 3), ('a', 1)]) 3
shape:返回矩阵的维数;
from numpy import * c=array([[1,1],[2,3,],[5,6]]) print(c) print(c.shape) print(c.shape[0]) print(c.shape[1]) 输出: [[1 1] [2 3] [5 6]] (3, 2) 3 2
operator.itemgetter():返回对象特定维的数据,结合sorted()方法使用:
import operator students=[['刚田武',20,'gangtw'],['朱二娃',25,'zhuerw'],['咪咪two',30,'miomitwo']] print(sorted(students,key=operator.itemgetter(1),reverse=True)) 输出: [['咪咪two', 30, 'miomitwo'], ['朱二娃', 25, 'zhuerw'], ['刚田武', 20, 'gangtw']]
argsort():返回数组值从小到大的索引值