《机器学习实战》笔记 第二章(2)

版权声明:未经许可禁止转发!谢谢配合! https://blog.csdn.net/qq_37510292/article/details/84955351

2.2 约会网站配对

代码实现

这里原作者给出的数据集的标签不是int,实现代码的时候,出现了问题,给出两种解决方案。以下是书上的源代码

#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    #得到文件的行数
    numberOfLines = len(arrayOlines)
    #创建返回Numpy的矩阵
    returnMat = zeros((numberOfLines,3))
    classLabelVector = []
    index = 0
    #解析文件数据到列表
    for line in arrayOlines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

解决方案①

替换classLabelVector.append(int(listFromLine[-1]))

#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    #得到文件的行数
    numberOfLines = len(arrayOlines)
    #创建返回Numpy的矩阵
    returnMat = zeros((numberOfLines,3))
    classLabelVector = []
    index = 0
    #解析文件数据到列表
    for line in arrayOlines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        if listFromLine[-1] == 'did_not_Like':
            classLabelVector.append(1)
        elif listFromLine[-1] == 'small_Doses':
            classLabelVector.append(2)
        elif listFromLine[-1] == 'large_Doses':
            classLabelVector.append(3)
        index += 1
    return returnMat,classLabelVector

注意,Python2可直接输入reload()
但Python3必须先import importlib导入!
在ipython下

>>>import importlib
>>>importlib.reload(kNN)
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet.txt')

解决方案②

代码同书

#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
    fr = open(filename)
    arrayOlines = fr.readlines()
    #得到文件的行数
    numberOfLines = len(arrayOlines)
    #创建返回Numpy的矩阵
    returnMat = zeros((numberOfLines,3))
    classLabelVector = []
    index = 0
    #解析文件数据到列表
    for line in arrayOlines:
        line = line.strip()
        listFromLine = line.split('\t')
        returnMat[index,:] = listFromLine[0:3]
        classLabelVector.append(int(listFromLine[-1]))
        index += 1
    return returnMat,classLabelVector

在ipython下引用把标签格式改为int的datingTestSet2.txt

>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet2.txt')

输出datingDataMat和datingLabels

In[1]: datingDataMat
Out[1]: 
array([[4.0920000e+04, 8.3269760e+00, 9.5395200e-01],
       [1.4488000e+04, 7.1534690e+00, 1.6739040e+00],
       [2.6052000e+04, 1.4418710e+00, 8.0512400e-01],
       ...,
       [2.6575000e+04, 1.0650102e+01, 8.6662700e-01],
       [4.8111000e+04, 9.1345280e+00, 7.2804500e-01],
       [4.3757000e+04, 7.8826010e+00, 1.3324460e+00]])
In[2]: datingLabels[0:20]
Out[2]: [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]

创建散点图

需要导入matplotlib来创建散点图

import matplotlib
import matplotlib.pyplot as plt

开始构图

fig=plt.figure()  
ax=fig.add_subplot(111)  
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*array(datingLabels),15.0*array(datingLabels))   
plt.show()

结果如图所示,图中横轴表示玩视频游戏所耗时间百分比,竖轴表示每周所消费的冰淇淋公升数
横轴表示玩视频游戏所耗时间百分比,竖轴表示每周所消费的冰淇淋公升数
特别提醒:如果把书上代码classLabelVector.append(int(listFromLine[-1]))改为classLabelVector.append(listFromLine[-1])会发生无法预料的错误,建议采用本文所诉的两种解题方式

归一化数据

书上代码无误建议手写一遍

def autoNorm(dataSet):
    minVals = dataSet.min(0)
    maxVals = dataSet.max(0)
    ranges = maxVals - minVals
    normDataSet = zeros(shape(dataSet))
    m = dataSet.shape[0]
    normDataSet = dataSet - tile(minVals, (m,1))
    #特征值相除
    normDataSet = normDataSet/tile(ranges, (m,1))
    return normDataSet, ranges, minVals

作为完整程序验证分类器

源码

def datingClassTest():
    hoRatio = 0.10
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    m = normMat.shape[0]
    numTestVecs = int(m*hoRatio)
    errorCount = 0.0
    for i in range(numTestVecs):
        classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:], \
                                     datingLabels[numTestVecs:m],3)
        print("the classfier came back with: %d,the real answer is : %d" \
                                     % (classifierResult,datingLabels[i]))
        if (classifierResult != datingLabels[i]): errorCount += 1.0
    print("the total error rate is: %f" % (errorCount/float(numTestVecs)))

转到ipython

In[1]: import kNN

In[2]: datingDataMat,datingLabels = kNN.file2matrix('datingTestSet2.txt')

In[3]: normMat, ranges, minVals = kNN.autoNorm(datingDataMat)

In[4]: normMat
Out[4]: 
array([[0.44832535, 0.39805139, 0.56233353],
       [0.15873259, 0.34195467, 0.98724416],
       [0.28542943, 0.06892523, 0.47449629],
       ...,
       [0.29115949, 0.50910294, 0.51079493],
       [0.52711097, 0.43665451, 0.4290048 ],
       [0.47940793, 0.3768091 , 0.78571804]])

In[5]: ranges
Out[5]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])

In[6]: minVals
Out[6]: array([0.      , 0.      , 0.001156])

构建完整系统

源码

def classifyPerson():
    resultList = ['not at all','in small doses','in large doses']
    percentTats = float(input("percentage of time spent playing video games?"))
    ffMiles = float(input("frequent flier miles earned per year?"))
    iceCream = float(input("liters of ice cream consumed per years?"))
    datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
    normMat, ranges, minVals = autoNorm(datingDataMat)
    inArr = array([ffMiles, percentTats, iceCream])
    classifierResult = classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
    print("You will probably like this person: ",resultList[classifierResult - 1])

转到ipython

In[1]: import kNN

In[2]: kNN.datingClassTest()
Out[2]: 
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 1
the total error rate is: 0.050000

猜你喜欢

转载自blog.csdn.net/qq_37510292/article/details/84955351