机器学习 k-近邻算法

1、使用python导入数据

from numpy import *
def createDataSet():
    group=array([[1.1,1.1],[1.0,1.0],[0,0],[0,0.1]])
    labels=['A','A','B','B']
    return group,labels

kNN分类算法：

from numpy import *
import operator
def classify0(inX,dataSet,labels,k):
    dataSetSize=dataSet.shape[0]    #shape[0]表示dataSet的行数
    diffMat=tile(inX,(dataSetSize,1))-dataSet
    sqDiffMat=diffMat**2
    sqDistances=sqDiffMat.sum(axis=1)
    distances=sqDistances**0.5
    sortedDistIndicies=distances.argsort()
    classCount={}
    for i in range(k):
        voteIlabel=labels[sortedDistIndicies[i]]
        classCount[voteIlabel]=classCount.get(voteIlabel,0)+1
    sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)
    return sortedClassCount[0][0]

distances是1*4的矩阵，分别表示待分类的点与所有已知点的距离；
sortedDistIndicies是distances从小到大的索引值；
voteIlabel相当于临时变量，用来取得标签值；

classCount[voteIlabel]=classCount.get(voteIlabel,0)+1  　　如果在字典classCount中找到key=voteIlabel的value,就加1，找不到的话classCount.get(voteIlabel,0)返回0然后加1

sortedClassCount=sorted(classCount.items(),key=operator.itemgetter(1),reverse=True)　　先把字典classCount变成列表，再按照第二维降序排列，返回的仍是列表

执行算法：

import kNN
from classify_kNN import *
g,l=kNN.createDataSet()
result=classify0([0,0],g,l,3)
print(result)

输出：

items()：将字典中的项按照列表返回，无序：

get()：返回字典对应key的value值，不存在key时返回第二个参数：

dic={'a':1,'b':2,'c':3}
print(dic.items())
print(dic.get('c','no'))
输出：
dict_items([('b', 2), ('c', 3), ('a', 1)])
3

shape：返回矩阵的维数；

from numpy import *
c=array([[1,1],[2,3,],[5,6]])
print(c)
print(c.shape)
print(c.shape[0])
print(c.shape[1])
输出：
[[1 1]
 [2 3]
 [5 6]]
(3, 2)
3
2

operator.itemgetter()：返回对象特定维的数据，结合sorted()方法使用：

import operator
students=[['刚田武',20,'gangtw'],['朱二娃',25,'zhuerw'],['咪咪two',30,'miomitwo']]
print(sorted(students,key=operator.itemgetter(1),reverse=True))
输出：
[['咪咪two', 30, 'miomitwo'], ['朱二娃', 25, 'zhuerw'], ['刚田武', 20, 'gangtw']]

argsort()：返回数组值从小到大的索引值

机器学习 k-近邻算法

猜你喜欢