Jupyter Notebook
PCA3
现在就是都用sklearn里面封装的方法啦,自己就不写了
from sklearn.decomposition import PCA
from sklearn import datasets
from sklearn.model_selection import train_test_split
digist = datasets.load_digits()
X.shape
X = digist.data
Y = digist.target
X.shape
(1797, 64)
Y
X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
KNN 算法来搞一下
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier()
%time
%time knn_clf.fit(X_train,Y_train)
Wall time: 2.99 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
%time
%time knn_clf.score(X_test, Y_test)
Wall time: 58.8 ms
0.9755555555555555
现在用PCA降维来处理
pca = PCA(0.95)
pca.fit(X_train)
X_train_transform = pca.transform(X_train)
X_test_transform = pca.transform(X_test)
X_train_transform.shape
(1347, 28)
现在降到了28,我们来测试一下时间性能吧
nkk_clf_2 = KNeighborsClassifier()
nkk_clf_2
%time nkk_clf_2.fit(X_train_transform, Y_train)
Wall time: 1.99 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
%time nkk_clf_2.score(X_test_transform, Y_test)
Wall time: 22.9 ms
0.9711111111111111
不降维处理,花费58.8ms,准确率为0.975555
降维时间提升两倍多,准确率才损失0.04,是不是很划算。
要是数据在很多的时候,效能的提升是不是更加明显