Sklearn ---- decision tree to predict the type of contact lenses

. 1  Import PANDAS AS PD
 2  Import pydotplus
 . 3  from sklearn.externals.six Import the StringIO   # LabelEncoder: Converts a string to an incremental value 
. 4  # OneHotEncoder: One-of-K using the algorithm to convert the string to an integer 
. 5  from sklearn.preprocessing Import LabelEncoder, OneHotEncoder
 . 6  from sklearn Import Tree
 . 7  
. 8  IF  the __name__ == ' __main__ ' :
 . 9      with Open ( ' lenses.txt ' , 'R & lt ' ) AS fr:                                         # load file 
10          Lenses = [. inst.strip () Split ( ' \ T ' ) for Inst in fr.readlines ()]         # processing document 
. 11      lenses_target = []                                                         # extract each data category stored in the list 
12 is      for each in Lenses:
 13 is          lenses_target.append (each [-1 ])
 14      Print (lenses_target)
 15  
16      lensesLabels = [ ' Age ', ' Prescript ' , ' Astigmatic ' , ' tearRate ' ]             # feature tag        
. 17      lenses_list = []                                                         # saved lenses temporary list data 
18 is      lenses_dict} = {                                                         # saved lenses dictionary data for generating PANDAS 
. 19      for each_label in lensesLabels:                                             # extract information, to generate the dictionary 
20          for the each in Lenses:
 21             lenses_list.append (each [lensesLabels.index (each_label)])
 22 is          lenses_dict [each_label] = lenses_list
 23 is          lenses_list = []
 24      # Print (lenses_dict) print dictionary information # 
25      lenses_pd = pd.DataFrame (lenses_dict)   # generates pandas.DataFrame 
26 is      # Print (lenses_pd) # print pandas.DataFrame 
27      Le = LabelEncoder ()   # Create LabelEncoder () object for serializing 
28      for COL in lenses_pd.columns:   # is the sequence of each column 
29          lenses_pd [COL] =le.fit_transform (lenses_pd [COL])
 30      # Print (lenses_pd) printing coded information # 
31 is  
32      CLF = tree.DecisionTreeClassifier (= MAX_DEPTH. 4)   # Create DecisionTreeClassifier () class 
33 is      CLF = clf.fit (lenses_pd.values.tolist ( ), lenses_target)   # usage data, a decision tree 
34 is      dot_data = the StringIO ()
 35      tree.export_graphviz (CLF, the out_file = dot_data,   # draw tree 
36                           feature_names = lenses_pd.keys (),
 37 [                           class_names = clf.classes_,
 38 is                           Filled True =, Rounded = True,
39                           special_characters = True)
 40      Graph = pydotplus.graph_from_dot_data (dot_data.getvalue ())
 41 is      graph.write_pdf ( " tree.pdf " )   # save draw a good tree, stored in the form of PDF.

1, the data set (data set required Message)  

  A total of 24 sets of data, Labels data followed age, , prescript, astigmatic, tearRate, classis the first column is the age, the second column is the symptom, the third column is whether astigmatism, the fourth column is the number of tears, the fifth column is the ultimate classification label. Data as shown below:

 

2, the visual tree using Graphviz

  Graphviz is AT & T Labs Research developed the graphic drawing tools, he can easily be used to draw graphics structured network, it supports multiple output formats, quality and speed to generate pictures are good. Its input is a script with a dot plot language, by analyzing the input of the script, wherein the analysis points, edges and sub-picture, and is drawn to the property. It is a decision tree is generated using Sklearn dot format, so we can directly use the Graphviz visualization of the decision tree.

  Before writing code, you need to install pydotplus and Grphviz

 

 3, install Graphviz

   Graphviz can not be used pipfor installation, we need to manually install, download address: http://www.graphviz.org/Home.php

  Download good installation package, install, after installation, you need to set environment variables Graphviz, on your Path system variable, add the environment variable Graphviz, such as Graphviz installed in the root directory of the D drive, then added:D:\Graphviz\bin;

  Run the code stored in the same directory as the python file named generates a treePDF file, the file is opened, you can see the decision tree visualization FIG.

 

Guess you like

Origin www.cnblogs.com/fd-682012/p/11646230.html