The Road to Machine Learning: Python Comprehensive Classifier Random Forest Classification Gradient Boosting Decision Tree Classification Titanic Survivor

 

python3 learns the api for gradient boosting decision tree classification using random forest classifiers and compares them with the prediction results of a single decision tree

Attach my git, you are welcome to refer to the code of my other classifiers: https://github.com/linyi0604/MachineLearning

 

  1  import pandas as pd
   2  from sklearn.cross_validation import train_test_split
   3  from sklearn.feature_extraction import DictVectorizer
   4  from sklearn.tree import DecisionTreeClassifier
   5  from sklearn.metrics import classification_report
   6  from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
   7  
  8  ''' 
  9  Integrated classification 10.  Consider the prediction results of multiple classifiers comprehensively
 .
11 This kind of comprehensive consideration is generally divided into two types:
 12      1 Build multiple independent classification models, and then through voting methods such as random forest classifiers
 13          Random forest builds multiple decision trees on the training data at the same time, and these decision trees are constructed during the construction. At times, the unique algorithm will be abandoned, and features will be randomly selected
 14      2 Build multiple classification models in a certain order,
 15          There is a dependency between them, and the addition of each subsequent model requires the comprehensive performance contribution of the existing model,
 16          From multiple weaker The classifier builds a relatively powerful classifier, such as the gradient boosting decision tree
 17.          The admiral forest decision tree is built to minimize the error of the adult in fitting the data.
18          
19  The following will compare the prediction of a single decision tree random forest gradient boosting decision tree
 20  
21  ''' 
22  
23  ''' 
24  1 Prepare data
 25  ''' 
26  #Read Titanic passenger data, which has been downloaded from the Internet to the local 
27 titanic = pd.read_csv( " ./data/titanic/titanic.txt" )
 28  #Observe the data and find that there is a missing phenomenon 
29  # print(titanic.head()) 
30  
31  #Extract key features, sex, age, pclass are likely to affect whether or not to survive 
32 x = titanic[[ ' pclass ' , ' age ' , ' sex ' ]]
 33 y = titanic[ ' survived ' ]
 34  #View currently selected features 
35  # print(x.info()) 
36  ''' 
37  <class 'pandas.core.frame.DataFrame' >
 38  RangeIndex: 1313 entries, 0 to 1312
39  Data columns (total 3 columns):
 40  pclass 1313 non-null object
 41  age 633 non-null float64
 42  sex 1313 non-null object
 43  dtypes: float64(1), object(2)
 44  memory usage: 30.9+ KB
 45  None
 46  ''' 
47  # There are only 633 age data columns. For vacancies, the mean or median is used to supplement the model. 
48 x[ ' age ' ].fillna(x[ ' age ' ].mean( ), inplace= True)
 49  
50  ''' 
51  2 Data split
 52  ''' 
53x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.25, random_state=33 )
 54  #Use feature converter for feature extraction 
55 vec = DictVectorizer()
 56  #Category data will be extracted from data type will remain the same 
57 x_train = vec.fit_transform(x_train.to_dict(orient= " record " ))
 58  # print(vec.feature_names_) # ['age', 'pclass=1st', 'pclass=2nd', 'pclass =3rd', 'sex=female', 'sex=male'] 
59 x_test = vec.transform(x_test.to_dict(orient= " record " ))
 60  
61  ''' 
62  3.
1 Single decision tree training model for prediction 63 ''' 
64  #Initialize decision tree classifier 
65 dtc = DecisionTreeClassifier()
 66  #Training 67 dtc.fit 
(x_train, y_train)
 68 #Predict save result 69 dtc_y_predict = dtc.predict(x_test)
 70 71 ''' 72 3.2 Use Random forest training model for prediction
 73 ''' 74 #initialize random forest classifier 75 rfc = RandomForestClassifier()
 76 #training 77 rfc.fit (x_train, y_train)
 78 #predict 79 rfc_y_predict = rfc.predict(x_test)
  
 
 
  
 
 
  
80  
81  ''' 
82  3.3 Use gradient boosting decision tree for model training and prediction
 83  ''' 
84  #Initialize the classifier 
85 gbc = GradientBoostingClassifier()
 86  #train 87 gbc.fit 
(x_train, y_train)
 88 #Predict 89 gbc_y_predict = gbc.predict(x_test)
 90 91 92 ''' 93 4 Model evaluation
 94 ''' 95 print ( " Single decision tree accuracy: " , dtc.score(x_test, y_test))
 96 print ( "Additional metrics:\n "  
 
 
 
  
  , classification_report(dtc_y_predict, y_test, target_names=[ ' died ' , ' survived ' ]))
 97  
98  print ( " Random forest accuracy: " , rfc.score(x_test, y_test))
 99  print ( " Other metrics:\n " , classification_report(rfc_y_predict, y_test, target_names=[ ' died ' , ' survived ' ]))
 100  
101  print ( " Gradient boosted decision tree accuracy: " , gbc.score(x_test,
y_test))
102 print(" Other metrics:\n " , classification_report(gbc_y_predict, y_test, target_names=[ ' died ' , ' survived ' ]))
 103  
104  ''' 
105  Single decision tree accuracy: 0.7811550151975684
 106  Other metrics:
 107                precision recall f1-score support
 108  
109         died 0.91 0.78 0.84 236
 110     survived 0.58 0.80 0.67 93
 111  
112  avg / total 0.81 0.78 0.79 329
 113  
114  Random forest accuracy: 0.78419452887538
 115  Other metrics:
116  precision
               recall
               f1-score support
 117
 118  
died         0.91 0.78 0.84 237
 119 survived 0.58     0.80 0.68 92
 120  
121  avg / total 0.82 0.78 0.79 329
 122  
123  126 127        died 0.92 0.78 0.84 239
 128    survived 0.58 0.82 0.68 90
 129 130 avg / total 0.83 0.79 0.80 329
 131   
   
  
132 '''

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325047467&siteId=291194637