决策树的测试和存储、示例：使用决策树预测隐形眼镜类型

测试算法：使用决策树的分类函数

def classify(inputTree,featLabels, testVec):#传入(myTree,labels,[1,0]),结果是‘no’labels = ['no surfacing','flippers']
    #myTree {'no surfacing': {0: 'no', 1:{'flippers':{0: 'no', 1: 'yes'}}}}
    firstStr = list(inputTree.keys())[0]  #firstStr='no surfacing'
    secondDict = inputTree[firstStr]  #{0: 'no', 1:{'flippers':{0: 'no', 1: 'yes'}}}
    featIndex = featLabels.index(firstStr)#检查字符串featLabels中是否有no surfacing featIndex=0
    for key in secondDict.keys():#遍历键值从0
        if testVec[featIndex]==key:#testVec[0]=1!=0开始测试实例的第0个特征取值等于第key个子节点
            if type(secondDict[key]).__name__== 'dict':#判断如果是字典，继续调用函数
                classLabel = classify(secondDict[key], featLabels, testVec)
            else:   classLabel =  secondDict[key] #如果不是，则为最终分类结果0: 'no'
    return classLabel

使用算法：决策树的存储

pickle.dump(obj, file, [,protocol])
注释：序列化对象，将对象obj保存到文件file中去。
参数protocol是序列化模式，默认是0（ASCII协议，表示以文本的形式进行序列化），protocol的值还可以是1和2（1和2表示以二进制的形式进行序列化。其中，1是老式的二进制协议；2是新二进制协议）。
file表示保存到的类文件对象，file必须有write()接口，file可以是一个以’w’打开的文件或者是一个StringIO对象，也可以是任何可以实现write()接口的对象。

将学习结果-决策树保存到硬盘里，可以省去麻烦，不需要每次都重新构造决策树。

#决策树的存储：python的pickle模块序列化决策树对象，使决策树保存在磁盘中
#在需要时读取即可，数据集很大时，可以节省构造树的时间
#pickle模块存储决策树
def storeTree(inputTree,filename):
    import pickle #导入pickle模块
    fw = open(filename,'wb')#创建一个可以'写'的文本文件,如果按书中写的'w',将会报错write() argument must be str,not bytes所以这里改为二进制写入'wb'
    pickle.dump(inputTree,fw)#pickle的dump函数将决策树写入文件中
    fw.close() #写完成后关闭文件

#取决策树操作  
def grabTree(filename):
    import pickle
    fr = open(filename,'rb')#对应于二进制方式写入数据，'rb'采用二进制形式读出数据
    return pickle.load(fr)

出现问题
TypeError: write() argument must be str, not bytes

fw = open(filename,'w')改为fw = open(filename,'wb')

出现问题
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 0

fr = open(filename)改为fr = open(filename,'rb')

效果

>>> import trees
>>> from imp import reload
>>> reload(trees)
<module 'trees' from 'E:\\Python\\trees.py'>
>>> import treePlotter
>>> myTree=treePlotter.retrieveTree(0)
>>> trees.storeTree(myTree,'classifierStorage.txt')
>>> trees.grabTree('classifierStorage.txt')
{'no surfacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}

示例：使用决策树预测隐形眼镜类型
通过一个例子讲解决策树如何预测患者需要佩戴的隐形眼镜类型

>>> import trees
>>> from imp import reload
>>> reload(trees)
<module 'trees' from 'E:\\Python\\trees.py'>
>>> fr=open('lenses.txt')#打开文本数据
>>> lenses=[inst.strip().split('\t') for inst in fr.readline()]#将文本数据的每一个数据行按照tab键分割，并依次存lenses
>>> lensesLabels=['age','prescript','astigmatic','tearRate']
>>> lensesTree = trees.createTree(lenses,lensesLabels)
>>> lensesTree
''

这就很恼火了，把创建树的函数看了好几遍也没找到问题在哪，我的树呢，应该是之后文件出了问题，因为写了新函数，不用命令行，命令行太麻烦，每次都要粘贴、复制

def predictLensesType(filename):
    fr=open(filename)
    lenses=[inst.strip().split('\t') for inst in fr.readlines()]
    lensesLabels=['age','prescript','astigmatic','tearRate']
    lensesTree=createTree(lenses,lensesLabels)
    return lensesTree

然后调用，一点事都没有，找不到问题

>>> import trees
>>> from imp import reload
>>> reload(trees)
<module 'trees' from 'E:\\Python\\trees.py'>
>>> trees.predictLensesType('lenses.txt')
{'tearRate': {'normal': {'astigmatic': {'no': {'age': {'pre': 'soft', 'young': 'soft', 'presbyopic': {'prescript': {'hyper': 'soft', 'myope': 'no lenses'}}}}, 'yes': {'prescript': {'hyper': {'age': {'pre': 'no lenses', 'young': 'hard', 'presbyopic': 'no lenses'}}, 'myope': 'hard'}}}}, 'reduced': 'no lenses'}}
>>> import treePlotter
>>> treePlotter.createPlot(trees.predictLensesType('lenses.txt'))

还是多了这条线，没法了，救不了自己了
这里写图片描述
匹配选项过多，将这种问题称为过渡匹配，为了减少过渡匹配，可以裁剪决策树，去掉一些不必要的叶子节点。如果叶子节点只能增加少许信息，则可以删除该节点，将它并入到其他叶子节点中。

决策树的测试和存储、示例：使用决策树预测隐形眼镜类型

猜你喜欢