"Machine Learning Practical Notes--Part 1 Classification Algorithm: Decision Tree 3"

        Construct the classifier:

After we have constructed the decision tree, we can use it for actual classification. The decision tree and the label vector used to construct the tree are required to perform classification. The program compares the test data with the values ​​on the decision tree, and executes the process recursively until it enters the leaf node; finally, the test data is defined as the type of the leaf node.

def classify(inputTree, featLabels, testVec):
    #featLabels Feature Label List
    firstStr = list(inputTree.keys())[0]
    #print('first:',firstStr)
    secondDict = inputTree[firstStr]
    #Find where the features used to divide the dataset are stored
    featIndex = featLabels.index(firstStr)
    #print('index:',featIndex)
    for key in secondDict.keys():
        #print(key)
        #print(testVec[featIndex])
        #print('******')
        if testVec[featIndex] == key:
            if type(secondDict[key]).__name__ == 'dict':
                classLabel = classify(secondDict[key], featLabels, testVec)
                #print('classLabel:',classLabel)
            else:
                classLabel = secondDict[key]
                #print('classLabel:',classLabel)
    return classLabel

    

     The result of the run is shown above.

    Now that we have created the decision tree classifier, but we must rebuild the decision tree every time we use classification, let's learn how to store the decision tree classifier on disk.

    Using Algorithms: Storage of Decision Trees

    It is very troublesome to construct a decision tree, so in order to save computing time, it is best to call the constructed decision tree every time the classifier is executed. To solve this problem, you need to use the python module pickle to serialize the object. Serialized objects can be saved on disk and read back when needed. Any object can be serialized, and dictionary objects are no exception.

def storeTree(inputTree, filename):
    import pickle
    #The storage method is binary by default.
    fw = open(filename, 'wb+')
    pickle.dump(inputTree, fw)
    fw.close
    
def grabTree(filename):
    import pickle
    fr = open(filename, 'rb')
    return pickle.load(fr)

    

We can see that the classifier is already stored on the hard drive without having to relearn it every time.

    Example: Using a Decision Tree to Predict Contact Lens Type

(1) Collect data: provided text file

(2) Prepare data: parse the data rows separated by the tab key

(3) Analyzing data: Quickly check the data to ensure that the content of the data is correctly parsed, and use the createPlot() function to draw the final graph.

(4), training algorithm: use the previous createTree() function

(5) Test algorithm: write a test function decision tree that can correctly classify a given data instance

(6), use algorithm: store the data structure of the tree so that the tree does not need to be reconstructed the next time it is used

fr = open('lenses.txt')
#strip() removes the specified characters from both ends of the string, the default is a space
lenses = [inst.strip().split('\t') for inst in fr.readlines()]
print(lenses)
print('*****')
lensesLabels = ['age','prescript','astigmatic','tearRate']
lensesTree = createTree(lenses, lensesLabels)
print(lensesTree)


    The final decision diagram is:

        

    Doctors only need four inquiries to determine which type of glasses, yet there are so many matching options that we call this question an overmatch. In order to reduce the problem of over-matching, we can prune the decision tree and remove some unneeded leaf nodes. If the leaf adds only a little information then delete the node and merge it into other leaf nodes. We will discuss this further in Chapter 9.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324632912&siteId=291194637