1、聚类模型
from sklearn.cluster import Kmeans
2、数据集
from sklearn.datasets import load_iris
sklearn标准数据结构
data = [[feature1,feature2,feature3]*nsample]
target = [0,2,,1,2,1,2,0...]
3、特征选择
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
fs = SelectKBest(chi2,k=10)
4、模型评估
from sklearn import metrics
y_pred = [0,2,1,3]
y_true = [0,1,2,3]
metrics.accuracy_score(y_true, y_pred)
0.5
metrics.accuracy_score(y_true, y_pred,normalize=False)
roc_auc_score(Receiver Operating Characteristics(受试者工作特性曲线,也就是说在不同的阈值下,True Positive Rate和False Positive Rate的变化情况))
auc就是曲线下面积,这个数值越高,则分类器越优秀