版权声明:未经许可禁止转发!谢谢配合! https://blog.csdn.net/qq_37510292/article/details/84955351
《机器学习实战》笔记 第二章 (2)
2.2 约会网站配对
代码实现
这里原作者给出的数据集的标签不是int,实现代码的时候,出现了问题,给出两种解决方案。以下是书上的源代码
#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
#得到文件的行数
numberOfLines = len(arrayOlines)
#创建返回Numpy的矩阵
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析文件数据到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
解决方案①
替换classLabelVector.append(int(listFromLine[-1]))
#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
#得到文件的行数
numberOfLines = len(arrayOlines)
#创建返回Numpy的矩阵
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析文件数据到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
if listFromLine[-1] == 'did_not_Like':
classLabelVector.append(1)
elif listFromLine[-1] == 'small_Doses':
classLabelVector.append(2)
elif listFromLine[-1] == 'large_Doses':
classLabelVector.append(3)
index += 1
return returnMat,classLabelVector
注意,Python2可直接输入reload()
但Python3必须先import importlib导入!
在ipython下
>>>import importlib
>>>importlib.reload(kNN)
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet.txt')
解决方案②
代码同书
#将约会数据文本记录转化为numpy的解析程序
def file2matrix(filename):
fr = open(filename)
arrayOlines = fr.readlines()
#得到文件的行数
numberOfLines = len(arrayOlines)
#创建返回Numpy的矩阵
returnMat = zeros((numberOfLines,3))
classLabelVector = []
index = 0
#解析文件数据到列表
for line in arrayOlines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index,:] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat,classLabelVector
在ipython下引用把标签格式改为int的datingTestSet2.txt
>>>datingDataMat, datingLabels = kNN.files2matrix('datingTestSet2.txt')
输出datingDataMat和datingLabels
In[1]: datingDataMat
Out[1]:
array([[4.0920000e+04, 8.3269760e+00, 9.5395200e-01],
[1.4488000e+04, 7.1534690e+00, 1.6739040e+00],
[2.6052000e+04, 1.4418710e+00, 8.0512400e-01],
...,
[2.6575000e+04, 1.0650102e+01, 8.6662700e-01],
[4.8111000e+04, 9.1345280e+00, 7.2804500e-01],
[4.3757000e+04, 7.8826010e+00, 1.3324460e+00]])
In[2]: datingLabels[0:20]
Out[2]: [3, 2, 1, 1, 1, 1, 3, 3, 1, 3, 1, 1, 2, 1, 1, 1, 1, 1, 2, 3]
创建散点图
需要导入matplotlib来创建散点图
import matplotlib
import matplotlib.pyplot as plt
开始构图
fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*array(datingLabels),15.0*array(datingLabels))
plt.show()
结果如图所示,图中横轴表示玩视频游戏所耗时间百分比,竖轴表示每周所消费的冰淇淋公升数
特别提醒:如果把书上代码classLabelVector.append(int(listFromLine[-1]))改为classLabelVector.append(listFromLine[-1])会发生无法预料的错误,建议采用本文所诉的两种解题方式
归一化数据
书上代码无误建议手写一遍
def autoNorm(dataSet):
minVals = dataSet.min(0)
maxVals = dataSet.max(0)
ranges = maxVals - minVals
normDataSet = zeros(shape(dataSet))
m = dataSet.shape[0]
normDataSet = dataSet - tile(minVals, (m,1))
#特征值相除
normDataSet = normDataSet/tile(ranges, (m,1))
return normDataSet, ranges, minVals
作为完整程序验证分类器
源码
def datingClassTest():
hoRatio = 0.10
datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
m = normMat.shape[0]
numTestVecs = int(m*hoRatio)
errorCount = 0.0
for i in range(numTestVecs):
classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:], \
datingLabels[numTestVecs:m],3)
print("the classfier came back with: %d,the real answer is : %d" \
% (classifierResult,datingLabels[i]))
if (classifierResult != datingLabels[i]): errorCount += 1.0
print("the total error rate is: %f" % (errorCount/float(numTestVecs)))
转到ipython
In[1]: import kNN
In[2]: datingDataMat,datingLabels = kNN.file2matrix('datingTestSet2.txt')
In[3]: normMat, ranges, minVals = kNN.autoNorm(datingDataMat)
In[4]: normMat
Out[4]:
array([[0.44832535, 0.39805139, 0.56233353],
[0.15873259, 0.34195467, 0.98724416],
[0.28542943, 0.06892523, 0.47449629],
...,
[0.29115949, 0.50910294, 0.51079493],
[0.52711097, 0.43665451, 0.4290048 ],
[0.47940793, 0.3768091 , 0.78571804]])
In[5]: ranges
Out[5]: array([9.1273000e+04, 2.0919349e+01, 1.6943610e+00])
In[6]: minVals
Out[6]: array([0. , 0. , 0.001156])
构建完整系统
源码
def classifyPerson():
resultList = ['not at all','in small doses','in large doses']
percentTats = float(input("percentage of time spent playing video games?"))
ffMiles = float(input("frequent flier miles earned per year?"))
iceCream = float(input("liters of ice cream consumed per years?"))
datingDataMat,datingLabels = file2matrix('datingTestSet2.txt')
normMat, ranges, minVals = autoNorm(datingDataMat)
inArr = array([ffMiles, percentTats, iceCream])
classifierResult = classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
print("You will probably like this person: ",resultList[classifierResult - 1])
转到ipython
In[1]: import kNN
In[2]: kNN.datingClassTest()
Out[2]:
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 3
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 3,the real answer is : 3
the classfier came back with: 2,the real answer is : 2
the classfier came back with: 1,the real answer is : 1
the classfier came back with: 3,the real answer is : 1
the total error rate is: 0.050000