. 1 Import PANDAS AS PD 2 Import pydotplus . 3 from sklearn.externals.six Import the StringIO # LabelEncoder: Converts a string to an incremental value . 4 # OneHotEncoder: One-of-K using the algorithm to convert the string to an integer . 5 from sklearn.preprocessing Import LabelEncoder, OneHotEncoder . 6 from sklearn Import Tree . 7 . 8 IF the __name__ == ' __main__ ' : . 9 with Open ( ' lenses.txt ' , 'R & lt ' ) AS fr: # load file 10 Lenses = [. inst.strip () Split ( ' \ T ' ) for Inst in fr.readlines ()] # processing document . 11 lenses_target = [] # extract each data category stored in the list 12 is for each in Lenses: 13 is lenses_target.append (each [-1 ]) 14 Print (lenses_target) 15 16 lensesLabels = [ ' Age ', ' Prescript ' , ' Astigmatic ' , ' tearRate ' ] # feature tag . 17 lenses_list = [] # saved lenses temporary list data 18 is lenses_dict} = { # saved lenses dictionary data for generating PANDAS . 19 for each_label in lensesLabels: # extract information, to generate the dictionary 20 for the each in Lenses: 21 lenses_list.append (each [lensesLabels.index (each_label)]) 22 is lenses_dict [each_label] = lenses_list 23 is lenses_list = [] 24 # Print (lenses_dict) print dictionary information # 25 lenses_pd = pd.DataFrame (lenses_dict) # generates pandas.DataFrame 26 is # Print (lenses_pd) # print pandas.DataFrame 27 Le = LabelEncoder () # Create LabelEncoder () object for serializing 28 for COL in lenses_pd.columns: # is the sequence of each column 29 lenses_pd [COL] =le.fit_transform (lenses_pd [COL]) 30 # Print (lenses_pd) printing coded information # 31 is 32 CLF = tree.DecisionTreeClassifier (= MAX_DEPTH. 4) # Create DecisionTreeClassifier () class 33 is CLF = clf.fit (lenses_pd.values.tolist ( ), lenses_target) # usage data, a decision tree 34 is dot_data = the StringIO () 35 tree.export_graphviz (CLF, the out_file = dot_data, # draw tree 36 feature_names = lenses_pd.keys (), 37 [ class_names = clf.classes_, 38 is Filled True =, Rounded = True, 39 special_characters = True) 40 Graph = pydotplus.graph_from_dot_data (dot_data.getvalue ()) 41 is graph.write_pdf ( " tree.pdf " ) # save draw a good tree, stored in the form of PDF.
1, the data set (data set required Message)
A total of 24 sets of data, Labels data followed age
, , prescript
, astigmatic
, tearRate
, class
is the first column is the age, the second column is the symptom, the third column is whether astigmatism, the fourth column is the number of tears, the fifth column is the ultimate classification label. Data as shown below:
2, the visual tree using Graphviz
Graphviz is AT & T Labs Research developed the graphic drawing tools, he can easily be used to draw graphics structured network, it supports multiple output formats, quality and speed to generate pictures are good. Its input is a script with a dot plot language, by analyzing the input of the script, wherein the analysis points, edges and sub-picture, and is drawn to the property. It is a decision tree is generated using Sklearn dot format, so we can directly use the Graphviz visualization of the decision tree.
Before writing code, you need to install pydotplus and Grphviz
3, install Graphviz
Graphviz can not be used pip
for installation, we need to manually install, download address: http://www.graphviz.org/Home.php
Download good installation package, install, after installation, you need to set environment variables Graphviz, on your Path system variable, add the environment variable Graphviz, such as Graphviz installed in the root directory of the D drive, then added:D:\Graphviz\bin;
Run the code stored in the same directory as the python file named generates a tree
PDF file, the file is opened, you can see the decision tree visualization FIG.