python scikit-learn学习

本次主要学习sklearn库,实验环境为win10下的anaconda

要求

  1. Create a classification dataset (n_samples >= 1000, n_features >= 10)
  2. Split the dataset using 10-fold cross validation
  3. Train the algorithms
    I GaussianNB
    I SVC (possible C values [1e-02, 1e-01, 1e00, 1e01, 1e02], RBF kernel)
    I RandomForestClassifier (possible n estimators values [10, 100, 1000])
  4. Evaluate the cross-validated performance
    I Accuracy
    I F1-score
    I AUC ROC
  5. Write a short report summarizing the methodology and the results

代码如下:

from sklearn import datasets
from sklearn import cross_validation
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics

#生成数据集
X, y = datasets.make_classification(n_samples = 1000, n_features = 10)

#use Scikit-learn for K-fold cross-validation
kf = cross_validation.KFold(len(X), n_folds = 10, shuffle = True)
for train_index, test_index in kf:    
    X_train, y_train = X[train_index], y[train_index]  
    X_test, y_test = X[test_index], y[test_index]

#Gaussian NB
clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_NB = metrics.accuracy_score(y_test, pred)
f1_NB = metrics.f1_score(y_test, pred)
auc_NB = metrics.roc_auc_score(y_test, pred)
print('NB:')
print(acc_NB)
print(f1_NB)
print(auc_NB)
print('--------------------')

#SVM
clf = SVC(C = 1e-01, kernel = 'rbf', gamma = 0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_SVM = metrics.accuracy_score(y_test, pred)
f1_SVM = metrics.f1_score(y_test, pred)
auc_SVM = metrics.roc_auc_score(y_test, pred)
print('SVM:')
print(acc_SVM)
print(f1_SVM)
print(auc_SVM)
print('--------------------')

#random forest
clf = RandomForestClassifier(n_estimators=10)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
acc_rf = metrics.accuracy_score(y_test, pred)
f1_rf = metrics.f1_score(y_test, pred)
auc_rf = metrics.roc_auc_score(y_test, pred)
print('random forest:')
print(acc_rf)
print(f1_rf)
print(auc_rf)
print('--------------------')

结果截图(以下是上次运行代码的截图)

这里写图片描述
这里写图片描述
这里写图片描述

总结

从几次运行结果来看,这三种方法的效果差别还是太大的,但无法肯定得知哪种方法的效果好,对不同数据有不同的预测结果。

猜你喜欢

转载自blog.csdn.net/qq_36974075/article/details/80727355