The Road to Machine Learning: Python Practical Boosting Tree XGBoost Classifier

git: https://github.com/linyi0604/MachineLearning

The dataset was downloaded locally by me, you can go to my git to get the dataset

The XGBoost
boosting classifier
belongs to the ensemble learning model. It
combines hundreds of tree models with low classification accuracy. Iterates
continuously, and each iteration generates a new


tree .
Use the XGBoost model and other classifiers to predict the Titanic disaster. Compare performance

 

1  import pandas as pd
 2  from sklearn.cross_validation import train_test_split
 3  from sklearn.feature_extraction import DictVectorizer
 4  from sklearn.ensemble import RandomForestClassifier
 5  from xgboost import XGBClassifier
 6  
7  ''' 
8  XGBoost
 9  boost classifier
 10 belongs      to     ensemble learning model
 11  Hundreds of thousands of tree models with low classification accuracy are combined
 12      and iteratively, each iteration generates a new tree
 13 ,      
14 , and      
15  to face the Titanic disaster prediction
16  Use the XGBoost model to compare the performance of other classifiers
 17  
18  ''' 
19  
20 titanic = pd.read_csv( " ../data/titanic/titanic.txt " )
 21  #Extract pclass age and sex as training samples 
22 x = titanic[[ " pclass " , " age " , " sex " ]]
 23 y = titanic[ " survived " ]
 24  #The collected age is empty and filled with the average 
25 x[ " age " ].fillna(x[ "age"].mean(), inplace= True)
 26  
27  #Split training data and test data 
28 x_train, x_test, y_train, y_test = train_test_split(x,
 29                                                      y,
 30                                                      test_size=0.25 ,
 31                                                      random_state=33 )
 32  #Extract dictionary features Vectorize 
33 vec = DictVectorizer()
 34 x_train = vec.fit_transform(x_train.to_dict(orient= " record " ))
 35 x_test = vec.transform(x_test.to_dict(orient=" record " ))
 36  
37  #Use the default configuration of random forest for prediction 
38 rfc = RandomForestClassifier()
 39  rfc.fit(x_train, y_train)
 40  print ( " Random forest prediction accuracy: " , rfc.score(x_test, y_test ))   # 0.7811550151975684 
41  
42  #Use the XGBoost model for prediction 
43 xgbc = XGBClassifier()
 44  xgbc.fit(x_train, y_train)
 45  print ( " XGBoost prediction accuracy: " , xgbc.score(x_test, y_test))   # 0.7872340425531915

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325842178&siteId=291194637